This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR29789, Missed invariant out of the loop with conditionals and shifts


On Thu, 15 Feb 2007, Roger Sayle wrote:

> 
> Hi Richard,
> 
> On Thu, 15 Feb 2007, Richard Guenther wrote:
> > 2007-01-25  Richard Guenther  <rguenther@suse.de>
> >
> > 	PR middle-end/29789
> > 	* fold-const.c (fold_binary): Do not fold (1 << a) & b to
> > 	(b >> a) & 1.
> >
> > I will apply this later.
> 
> Noooo!  Or at least not so quickly, without further benchmarking.

;)

> > So I verified that we create exactly the same assembler on two-operand
> > insn machines with and without the patch.
> 
> I didn't believe you :-), so I tested this out myself.
> 
> For the following test case:
> 
> int foo(int a, int b)
> {
>   return ((1 << a) & b) != 0;
> }
> 
> which currently on i686-pc-linux-gnu with -O2 -fomit-frame-pointer is
> 
> _foo:   movl    8(%esp), %eax
>         movl    4(%esp), %ecx
>         sarl    %cl, %eax
>         andl    $1, %eax
>         ret
> 
> but with your proposed patch becomes:
> 
> _foo:   movl    4(%esp), %ecx
>         movl    $1, %eax
>         sall    %cl, %eax
>         testl   %eax, 8(%esp)
>         setne   %al
>         movzbl  %al, %eax
>         ret
> 
> Which is significantly larger and slower.

Heh, ok.  I only tested on the testcase from PR29789.  It looks like
the above should be somewhere in the testsuite.

> I'm suprised that pinskia hadn't mentioned the various PRs concerning
> shifts and single bit tests, including those that would have to be
> reopened if you eliminated this code.
> 
> 
> This is just working around the real problem of invariant identification
> by reassociations.  Given an expression such as iv + (invar + 1), we
> currently canonicalize with the constant outermost, (iv + invar) + 1
> which may cause a simplistic invariant subexpression pass to miss the fact
> that (invar + 1) is loop invariant.  It's not just shifts, but all kinds
> of expressions that may need to be restructured to separate the loop
> variant from the loop invariant terms.

Well sure.  This patch was a quick shot at solving parts of our slowness
in spec2k6 libquantum.  I bet that PRE with the new VN could do all
this invariant motion - tree lim can only do it if I adjust its cost
metric and have the trees canonicalized differently, likewise rtl
invariant motion can do it with different canonicalization and current
costs.

> I'll see what can be done from within fold, or perhaps cleaned up
> within the RTL optimizers, but these transformations are part of a
> complex push-pull transformation chain, with the canonical forms
> being identified and appropriately lowered during RTL expansion.
> 
> [This is why you didn't see a change in an if(...) test.  We canonicalize
> to "(a >> b) & 1" at the tree-level and then at RTL expansion time, we
> determine whether the integer value is needed or not (in expand_expr vs.
> do_jump) and emit/expand the appropriate form.]
> 
> 
> Even if we decided on the alternate canonical form, which doesn't
> satisfy truth_value_p, your patch would also then need to transform
> "(a >> b) & 1" into "(1 << b) & a", so we could spot their equivalence,
> and lower it intelligently in expr.c/jump.c.
> 
> 
> At a minimum you should run SPEC (for performance) and CSiBE (for code
> size) to see, if your change wins more often than it loses.

I did run spec2k on x86_64 and all was just noise.  libquantum from
spec2k6 got a nice 5% improvement, I didn't run the rest.

I'll take the patch back for now.

Richard.

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]