This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PING fwprop and pr/19653



I found one instance of the problem in some leftover output files:

@@ -8253,12 +8159,11 @@
        if cc jump L$L$1478;
-       R0 = B [P5+2] (Z);
-       cc =R0==0;
+       R1 = B [P5+2] (Z);
+       cc =R1==0;
        if !cc jump L$L$1478 (bp);
        R0 = B [P4+2] (Z);
        cc =R0==0;
-       R1 = 0 (X);
        R0 = 1 (X);
        if cc R0 =R1; /* movsicc-2a */
        jump.s L$L$1501;

Before is fwprop, after is fwprop + expensive-cse.
As a first guess, I would think that you are testing three inter-basic-block CSE passes (path following in CSE, and the two GCSE passes). fwprop adds a second GCSE pass to mimic at least part of what CSE path following did. Testing fwprop + expensive-cse means getting path following + two GCSE passes.

I made some experiments with a reduced testcase:

int a;

int
f (char *x, char *y)
{
 if (a == 0)
   return 1;
 if (a > 1)
   return 0;
 return ! (*x == 0 && *y == 0);
}

On i386, this reduced testcase produces the same assembly language with and without the patch:

On Blackfin, we get better code if we disable TER because apparently GCC prefers to not see (*y != 0) in expand:

       R0 = B [P0] (Z);
       cc =R0==0;
       if cc jump L$L$4 (bp);
L$L$11:
       R0 = 1 (X);
L$L$4:
       UNLINK;

So, maybe it is possible to hack TER to combine expressions, but not loads (Steven?). With a patch, that I'm going to SPEC test and post later, I got something that seems even better to me:

       R0 = B [P0] (Z);
       R0 =-R0;
       R0 >>= 31;
       UNLINK;
       rts;

Also, with an sne pattern I get this:
       R0 = B [P0] (X);
       CC = R0;
       R0 = CC;
       R0 = R0.B (Z);
       UNLINK;
       rts;

The code produced by fwprop+expensive-gcse seems worse or equal to all of these alternatives, to me:

       R0 = B [P0] (Z);
       cc =R0==0;
       R0 = 1 (X);
       if cc R0 =R1; /* movsicc-2a */
       UNLINK;
       rts;

Note: the pattern I used for sne is

(define_expand "sne"
 [(set (match_dup 1) (eq:BI (match_dup 2) (match_dup 3)))
  (set (match_dup 1) (eq:BI (match_dup 1) (const_int 0)))
  (set (match_operand:SI 0 "register_operand" "")
       (ne:SI (match_dup 1) (const_int 0)))]
 ""
{
 operands[2] = bfin_compare_op0;
 operands[3] = bfin_compare_op1;
 operands[1] = bfin_cc_rtx;
})

Also, if we really want to perform this optimization as your GCC does, we should probably do it in jump threading via find_implicit_sets. But without RTL dumps, it is very hard to understand what GCC is doing: hopefully, when bugs are reported, we can start fixing them. So far, all we have is evidence that such missed optimizations really belong to other passes than CSE, and that they are pretty easy to fix. I believe the rest of this message provides further evidence.

The root problem is of course that nobody really knows what transformations CSE is supposed to do. :-(

Paolo


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]