PING fwprop and pr/19653
Paolo Bonzini
paolo.bonzini@lu.unisi.ch
Fri Mar 10 11:03:00 GMT 2006
> I found one instance of the problem in some leftover output files:
>
> @@ -8253,12 +8159,11 @@
> if cc jump L$L$1478;
> - R0 = B [P5+2] (Z);
> - cc =R0==0;
> + R1 = B [P5+2] (Z);
> + cc =R1==0;
> if !cc jump L$L$1478 (bp);
> R0 = B [P4+2] (Z);
> cc =R0==0;
> - R1 = 0 (X);
> R0 = 1 (X);
> if cc R0 =R1; /* movsicc-2a */
> jump.s L$L$1501;
>
> Before is fwprop, after is fwprop + expensive-cse.
As a first guess, I would think that you are testing three
inter-basic-block CSE passes (path following in CSE, and the two GCSE
passes). fwprop adds a second GCSE pass to mimic at least part of what
CSE path following did. Testing fwprop + expensive-cse means getting
path following + two GCSE passes.
I made some experiments with a reduced testcase:
int a;
int
f (char *x, char *y)
{
if (a == 0)
return 1;
if (a > 1)
return 0;
return ! (*x == 0 && *y == 0);
}
On i386, this reduced testcase produces the same assembly language with
and without the patch:
On Blackfin, we get better code if we disable TER because apparently GCC
prefers to not see (*y != 0) in expand:
R0 = B [P0] (Z);
cc =R0==0;
if cc jump L$L$4 (bp);
L$L$11:
R0 = 1 (X);
L$L$4:
UNLINK;
So, maybe it is possible to hack TER to combine expressions, but not
loads (Steven?). With a patch, that I'm going to SPEC test and post
later, I got something that seems even better to me:
R0 = B [P0] (Z);
R0 =-R0;
R0 >>= 31;
UNLINK;
rts;
Also, with an sne pattern I get this:
R0 = B [P0] (X);
CC = R0;
R0 = CC;
R0 = R0.B (Z);
UNLINK;
rts;
The code produced by fwprop+expensive-gcse seems worse or equal to all
of these alternatives, to me:
R0 = B [P0] (Z);
cc =R0==0;
R0 = 1 (X);
if cc R0 =R1; /* movsicc-2a */
UNLINK;
rts;
Note: the pattern I used for sne is
(define_expand "sne"
[(set (match_dup 1) (eq:BI (match_dup 2) (match_dup 3)))
(set (match_dup 1) (eq:BI (match_dup 1) (const_int 0)))
(set (match_operand:SI 0 "register_operand" "")
(ne:SI (match_dup 1) (const_int 0)))]
""
{
operands[2] = bfin_compare_op0;
operands[3] = bfin_compare_op1;
operands[1] = bfin_cc_rtx;
})
Also, if we really want to perform this optimization as your GCC does,
we should probably do it in jump threading via find_implicit_sets. But
without RTL dumps, it is very hard to understand what GCC is doing:
hopefully, when bugs are reported, we can start fixing them. So far,
all we have is evidence that such missed optimizations really belong to
other passes than CSE, and that they are pretty easy to fix. I believe
the rest of this message provides further evidence.
The root problem is of course that nobody really knows what
transformations CSE is supposed to do. :-(
Paolo
More information about the Gcc-patches
mailing list