[PATCH] Fix PR25620, pow() expansion missed-optimization, 2nd try
Roger Sayle
roger@eyesopen.com
Sun Nov 12 18:10:00 GMT 2006
Hi Richard,
On Sun, 12 Nov 2006, Richard Guenther wrote:
> Bootstrapped and regtested on x86_64-unknown-linux-gnu. SPEC 2k
> doesn't show a difference, Polyhedron has an improvement for air
> (down from 21s to 16.5s runtime).
Cool! I hadn't appreciated this had such a significant impact on
polyhedron.
This is much better, thanks. However, there are still a number of
(new) issues that need to be resolved.
The first is a potential bug...
> ! expand_simple_binop (mode, MULT, op, op2, op, 0, OPTAB_DIRECT);
The expand_binop, and expand_simple_binop, APIs treat their target
argument as a hint. Hence the return result of expand_simple_binop
isn't guaranteed to be in target_rtx. Hence you need to use
op = expand_simple_binop (....)
in this and similar uses. It turns out for your uses in expand_pow,
it makes sense to pass NULL_RTX as the target, allowing the expanders
to select the best destination for these intermediate results, often
a unique pseudo. This allows for more efficient initial RTL, and
improves the change of CSEing the intermediate results.
Secondly, you'll need to use save_expr or some similar mechanism to
avoid the argument arg0 from being evaluated multiple times. It may
have side-effects (such as a function call or volatile), and your
current implementation calls expand_expr on the same tree twice
without taking care to avoid re-evaluation. See the uses of
builtin_save_expr in builtins.c.
Next, I think the code could be made cleaner and more efficient
by nesting conditionals. For example, you test for things such as
the existance of a sqrt builtin and -Os very late, after we've done
a significant amount of work in the soft-float emulation. It makes
much more sense to test these conditions early, and avoid wasting
fruitless effort. In fact, I'd take the same opportunity to allow
flag_unsafe_math optimizations to fall through to the bottom of the
function.
Something like:
if (flag_unsafe_math_optimizations && !optimize_size)
{
fn = mathfn_builtin (type, BUILT_IN_SQRT);
if (fn != NULL_TREE)
{
real_arithmetic (...);
...
}
fn = mathfn_builtin (type, BUILT_IN_CBRT);
if (fn != NULL_TREE)
{
...
}
}
Finally,
> Index: testsuite/gcc.dg/builtins-58.c
> + /* { dg-do compile } */
> + /* { dg-options "-O -ffast-math -std=c99" } */
> ...
> + /* { dg-final { scan-assembler-times "cbrt" 3 } } */
I think you need to do something more to make this test case portable.
We'll only reduce to "cbrt" on targets that define TARGET_C99_FUNCTIONS,
which isn't the same as passing -std=c99 on the command line. I think
you need to use the builtins-config.h header so that you can guard this
test with #ifdef HAVE_C99_RUNTIME.
Sorry that this is becoming a bit of an effort. All of the points
above are "new" with your latest "2nd try" attempt at this patch.
Hopefully, third time is a charm. The patch is looking pretty good,
and works fine on x86 on your testcases. It's just the corner cases
and compile-time issues that need to be addressed.
Many thanks again for working on this. I think its reasonable to
leave the cbrt bits in. Is it usage of sqrt or cbrt that benefits
polyhedron's air?
Roger
--
More information about the Gcc-patches
mailing list