PING fwprop and pr/19653

Fri Mar 10 11:03:00 GMT 2006

> I found one instance of the problem in some leftover output files:
>
> @@ -8253,12 +8159,11 @@
>         if cc jump L$L$1478;
> -       R0 = B [P5+2] (Z);
> -       cc =R0==0;
> +       R1 = B [P5+2] (Z);
> +       cc =R1==0;
>         if !cc jump L$L$1478 (bp);
>         R0 = B [P4+2] (Z);
>         cc =R0==0;
> -       R1 = 0 (X);
>         R0 = 1 (X);
>         if cc R0 =R1; /* movsicc-2a */
>         jump.s L$L$1501;
>
> Before is fwprop, after is fwprop + expensive-cse.
As a first guess, I would think that you are testing three 
inter-basic-block CSE passes (path following in CSE, and the two GCSE 
passes).  fwprop adds a second GCSE pass to mimic at least part of what 
CSE path following did.  Testing fwprop + expensive-cse means getting 
path following + two GCSE passes.

I made some experiments with a reduced testcase:

int a;

int
f (char *x, char *y)
{
  if (a == 0)
    return 1;
  if (a > 1)
    return 0;
  return ! (*x == 0 && *y == 0);
}

On i386, this reduced testcase produces the same assembly language with 
and without the patch:

On Blackfin, we get better code if we disable TER because apparently GCC 
prefers to not see (*y != 0) in expand:

        R0 = B [P0] (Z);
        cc =R0==0;
        if cc jump L$L$4 (bp);
L$L$11:
        R0 = 1 (X);
L$L$4:
        UNLINK;

So, maybe it is possible to hack TER to combine expressions, but not 
loads (Steven?).  With a patch, that I'm going to SPEC test and post 
later, I got something that seems even better to me:

        R0 = B [P0] (Z);
        R0 =-R0;
        R0 >>= 31;
        UNLINK;
        rts;

Also, with an sne pattern I get this:
        R0 = B [P0] (X);
        CC = R0;
        R0 = CC;
        R0 = R0.B (Z);
        UNLINK;
        rts;

The code produced by fwprop+expensive-gcse seems worse or equal to all 
of these alternatives, to me:

        R0 = B [P0] (Z);
        cc =R0==0;
        R0 = 1 (X);
        if cc R0 =R1; /* movsicc-2a */
        UNLINK;
        rts;

Note: the pattern I used for sne is

(define_expand "sne"
  [(set (match_dup 1) (eq:BI (match_dup 2) (match_dup 3)))
   (set (match_dup 1) (eq:BI (match_dup 1) (const_int 0)))
   (set (match_operand:SI 0 "register_operand" "")
        (ne:SI (match_dup 1) (const_int 0)))]
  ""
{
  operands[2] = bfin_compare_op0;
  operands[3] = bfin_compare_op1;
  operands[1] = bfin_cc_rtx;
})

Also, if we really want to perform this optimization as your GCC does, 
we should probably do it in jump threading via find_implicit_sets.  But 
without RTL dumps, it is very hard to understand what GCC is doing: 
hopefully, when bugs are reported, we can start fixing them.  So far, 
all we have is evidence that such missed optimizations really belong to 
other passes than CSE, and that they are pretty easy to fix.  I believe 
the rest of this message provides further evidence.

The root problem is of course that nobody really knows what 
transformations CSE is supposed to do.  :-(

Paolo