[Bug target/26290] [4.1/4.2 Regression]: code pessimization wrt. GCC 4.0 probably due to TARGET_MEM_REF

rguenth at gcc dot gnu dot org gcc-bugzilla@gcc.gnu.org
Sun Nov 4 11:45:00 GMT 2007



------- Comment #20 from rguenth at gcc dot gnu dot org  2007-11-04 11:45 -------
With mainline we now get

        .p2align 4,,7
        .p2align 3
.L6:
        addl    $1, %eax
        cmpl    %eax, %edi
        movl    %eax, -20(%ebp)
        jle     .L3
        movl    %eax, %ecx
        movl    %esi, %edx
        .p2align 4,,7
        .p2align 3
.L5:
        movl    -4(%esi), %ebx
        movl    (%edx), %eax
        cmpl    %eax, %ebx
        jle     .L4
        movl    %eax, -4(%esi)
        movl    %ebx, (%edx)
.L4:
        addl    $1, %ecx
        addl    $4, %edx
        cmpl    %ecx, %edi
        jg      .L5
.L3:
        movl    -20(%ebp), %eax
        addl    $4, %esi
        cmpl    -16(%ebp), %eax
        jl      .L6

which looks good, apart from the issue Andrew pointed out (but that's
PR26726):

  lsti_11 = MEM[index: ivtmp.27_14, offset: 0x0fffffffc];

  MEM[index: ivtmp.27_14, offset: 0x0fffffffc] = lstj_15;

4.0 is still faster with the original testcase, but the only difference I
can spot is that mainline uses addl $1, %eax while 4.0 uses incl here.  Oh,
and 4.0 uses an extra induction variable(!) for the exit test (and less
loop alignment):

.L3:
        incl    %eax
        cmpl    %eax, 12(%ebp)
        movl    %eax, -20(%ebp)
        jle     .L4
        movl    12(%ebp), %edi
        movl    %esi, %edx
        xorl    %ebx, %ebx
        subl    %eax, %edi
        .p2align 4,,15
.L6:
        movl    -4(%esi), %ecx
        movl    (%edx), %eax
        cmpl    %eax, %ecx
        jle     .L7
        movl    %eax, -4(%esi)
        movl    %ecx, (%edx)
.L7:
        incl    %ebx
        addl    $4, %edx
        cmpl    %edi, %ebx
        jne     .L6
.L4:
        movl    -20(%ebp), %eax
        addl    $4, %esi
        cmpl    -16(%ebp), %eax
        jl      .L3

Using -mtune=core2 on trunk get's back the incl and makes the code faster
than 4.0 (on my Core CPU, that is).  So the generic tuning here makes the
difference for trunk.

4.2 is still broken, though.  I would say let's close this as fixed.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|4.0.4                       |4.0.4 4.3.0
   Last reconfirmed|2006-02-24 15:20:29         |2007-11-04 11:45:07
               date|                            |
            Summary|[4.1/4.2/4.3 Regression]:   |[4.1/4.2 Regression]: code
                   |code pessimization wrt. GCC |pessimization wrt. GCC 4.0
                   |4.0 probably due to         |probably due to
                   |TARGET_MEM_REF              |TARGET_MEM_REF


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26290



More information about the Gcc-bugs mailing list