This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR29789, Missed invariant out of the loop with conditionals and shifts
- From: Richard Guenther <rguenther at suse dot de>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 16 Feb 2007 11:04:04 +0100 (CET)
- Subject: Re: [PATCH] Fix PR29789, Missed invariant out of the loop with conditionals and shifts
- References: <Pine.LNX.4.44.0702151922120.30129-100000@www.eyesopen.com>
On Thu, 15 Feb 2007, Roger Sayle wrote:
>
> Hi Richard,
>
> On Thu, 15 Feb 2007, Richard Guenther wrote:
> > 2007-01-25 Richard Guenther <rguenther@suse.de>
> >
> > PR middle-end/29789
> > * fold-const.c (fold_binary): Do not fold (1 << a) & b to
> > (b >> a) & 1.
> >
> > I will apply this later.
>
> Noooo! Or at least not so quickly, without further benchmarking.
;)
> > So I verified that we create exactly the same assembler on two-operand
> > insn machines with and without the patch.
>
> I didn't believe you :-), so I tested this out myself.
>
> For the following test case:
>
> int foo(int a, int b)
> {
> return ((1 << a) & b) != 0;
> }
>
> which currently on i686-pc-linux-gnu with -O2 -fomit-frame-pointer is
>
> _foo: movl 8(%esp), %eax
> movl 4(%esp), %ecx
> sarl %cl, %eax
> andl $1, %eax
> ret
>
> but with your proposed patch becomes:
>
> _foo: movl 4(%esp), %ecx
> movl $1, %eax
> sall %cl, %eax
> testl %eax, 8(%esp)
> setne %al
> movzbl %al, %eax
> ret
>
> Which is significantly larger and slower.
Heh, ok. I only tested on the testcase from PR29789. It looks like
the above should be somewhere in the testsuite.
> I'm suprised that pinskia hadn't mentioned the various PRs concerning
> shifts and single bit tests, including those that would have to be
> reopened if you eliminated this code.
>
>
> This is just working around the real problem of invariant identification
> by reassociations. Given an expression such as iv + (invar + 1), we
> currently canonicalize with the constant outermost, (iv + invar) + 1
> which may cause a simplistic invariant subexpression pass to miss the fact
> that (invar + 1) is loop invariant. It's not just shifts, but all kinds
> of expressions that may need to be restructured to separate the loop
> variant from the loop invariant terms.
Well sure. This patch was a quick shot at solving parts of our slowness
in spec2k6 libquantum. I bet that PRE with the new VN could do all
this invariant motion - tree lim can only do it if I adjust its cost
metric and have the trees canonicalized differently, likewise rtl
invariant motion can do it with different canonicalization and current
costs.
> I'll see what can be done from within fold, or perhaps cleaned up
> within the RTL optimizers, but these transformations are part of a
> complex push-pull transformation chain, with the canonical forms
> being identified and appropriately lowered during RTL expansion.
>
> [This is why you didn't see a change in an if(...) test. We canonicalize
> to "(a >> b) & 1" at the tree-level and then at RTL expansion time, we
> determine whether the integer value is needed or not (in expand_expr vs.
> do_jump) and emit/expand the appropriate form.]
>
>
> Even if we decided on the alternate canonical form, which doesn't
> satisfy truth_value_p, your patch would also then need to transform
> "(a >> b) & 1" into "(1 << b) & a", so we could spot their equivalence,
> and lower it intelligently in expr.c/jump.c.
>
>
> At a minimum you should run SPEC (for performance) and CSiBE (for code
> size) to see, if your change wins more often than it loses.
I did run spec2k on x86_64 and all was just noise. libquantum from
spec2k6 got a nice 5% improvement, I didn't run the rest.
I'll take the patch back for now.
Richard.
--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs