This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Gcc 3.1 performance regressions with respect to 2.95.3


> 
> 
> > It is more or less special case. Overall 3.1 groks better the code
> > with lots of abstraction, but in this case 2.95 got particulary lucky.
> > It runs into similar slowdown only on slightly modified Stephanov as
> > long as I can remember.
> >
> 
> That is only partly true. When I benchmark gcc 3.1 at the -O versus
> -O2 optimization level with the help of the bench++ test suite, the
> performance is better in 45 cases at -O. Interestingly, the checks
> measuring loop overhead (L*) run all faster at the -O level. L00004 runs
> two times faster. And even more interesting is that o000007[a,b].cpp which
> checks the strength reduction capabilities of the compiler, o000007a.cpp
7a.cpp is refered twice. Isn't that typo?
Looking at 7a/b.cpp internal loop, it is not really test for strength reduction,
as loop optimizer is confused and does not strength reduce at all.

The internal loops look identical on -O2/-O except for register allocation
and scheduling.  Can someone sched light in why one runs slower than the
other?
-O1:
.L20:
        leal    (%ecx,%edi), %edx
        movl    32(%ebx,%ecx,4), %eax
        movl    %eax, 32(%ebx,%edx,4)
        leal    (%ecx,%esi), %edx
        movl    32(%ebx,%ecx,4), %eax
        movl    %eax, 432(%ebx,%edx,4)
        movl    %eax, 832(%ebx,%edx,4)
        incl    %ecx
        cmpl    $9, %ecx
        jle     .L20

-O2:
.L20:
        movl    32(%ebx,%ecx,4), %eax
        leal    (%ecx,%esi), %edx
        movl    %eax, 32(%ebx,%edx,4)
        leal    (%ecx,%edi), %edx
        movl    32(%ebx,%ecx,4), %eax
        incl    %ecx
        cmpl    $9, %ecx
        movl    %eax, 432(%ebx,%edx,4)
        movl    %eax, 832(%ebx,%edx,4)
        jle     .L20

> which checks dead code elimination, o000010b.cpp which checks redundant

I've investigated the o000010a.cpp and the regression does not reproduce at my
system at all.  Looking at the diff of assembly files, -O2 does better job.
There are two important differences - function alignment and BB reordering.

Can you try to run with -falign-functions=0 -falign-loops=0 -falign-jumps=0 -O2?
It looks like -O0 does not align.

The basic block reordering may be disabled via -fno-reorder-blocks.
Gcc mispredict one branch in the program that may cause the performance
degradation in tight dummy loop like this program have.

> code and o000011a.cpp checking the unreachable code optimizating facility

This test is wrong.  It asks compiler to do invalid transformation:
        a = c;
        if(a != c) {
                c = c + a;
                d = d + a;
                x = x + a;
        }
it expect to remove a!=c, but nothing is known about a and in case a is NaN,
the condition is executed.  With -ffast-math it removes the conditional
properly.

Concerning the -O2/-O1 degradation, one purpose can be:

-       fsts    12(%ebx)
        fucom   %st(0)
        fnstsw  %ax
+       fsts    12(%ebx)
        sahf
        jp      .L5
        je      .L6

Chips tends to have fast paths for fstsw/sahf/jump-like sequences and scheduling
the store in a way may disable the optimization.

Otherwise I would again guess it to be -freorder-blocks problem.  GCC does
quite right, but uneducated guess that a != c will likely be true and
put the comparisons in a way creating extra jump on the common, by gcc
believed to be uncommon, path:

-       jmp     .L2
-.L6:
-       fstp    %st(0)
 .L2:
-       flds    28(%ebx)
-       fdivs   12(%ebx)
+       flds    12(%ebx)
+       fdivrs  28(%ebx)
        fstps   28(%ebx)
        movl    -4(%ebp), %ebx
        movl    %ebp, %esp
        popl    %ebp
        ret
+       .p2align 4,,7
+.L6:
+       fstp    %st(0)
+       jmp     .L2


I believe this is just testcase that does not happen in real code.  FDO
would help gcc here as well.

> run slower at the -O2 optimization level although this level provides
> specific optimizations for these problems. Furthermore, function calls are
> slower at the -O2 level:
>   p000005.cpp  Static Class Method Call: 1-int Arg: Catches Exceptions
>   p000006.cpp  Static Class Method Call: 1-int *Arg: Catches Exceptions
>   p000008.cpp  Procedure Call: No Parameters: Called thru pointer, Catches Exceptions
>   p000012.cpp  Procedure Call: 10-(3-int) Args: Catches Exceptions
>   p000023.cpp  Same as p000022: called in loop to see if lookup is optimized
> 
> And in addition to the decrease in the performance of the stepanov tests
> there is a substantial decrease in the performance for processing
> complex numbers (S000004a).

This looks like aliasing bug.  Gcc generates a lot of reloading from
stack:
(insn 274 22 27 (set (reg/v:DF 60)
        (mem/u/f:DF (symbol_ref/u:SI ("*.LC3")) [0 S8 A64])) 95 {*movdf_integer} (nil)
    (expr_list:REG_EQUIV (const_double:DF 0 [0x0] 0 [0x0] 1073643520 [0x3ffe8000] 0 [0x0] 0 [0x0])
        (nil)))

(insn 27 274 273 (set (mem/s/j:DF (plus:SI (reg/f:SI 20 frame)
                (const_int -16 [0xfffffff0])) [0 <variable>.re+0 S8 A32])
        (reg/v:DF 60)) 95 {*movdf_integer} (insn_list 19 (nil))
    (expr_list:REG_DEAD (reg/v:DF 60)
        (nil)))

... later ...

(insn 249 41 252 (set (reg/v:SI 102)
        (mem/s:SI (plus:SI (reg/f:SI 20 frame)
                (const_int -16 [0xfffffff0])) [0 factor+0 S4 A128])) 45 {*movsi_1} (nil)
    (nil))


All memory references in a way are explicit and really should not invalidate
the reference.  I have no slightest idea why this happends, but we should
investigate it.

Honza
> 
> It looks that there are some flaws in the -O2 optimizer passes. Is
> there a chance that this is fixed for the upcoming gcc 3.1 release?
> 
> Hope this helps,
> 
> Peter Schmid
> 
> 
> 
>           RELATIVE TIMES ..........
> TEST NAME Pentium II, 350 MHz Pentium II, 350 MHz
>           gcc 3.1             gcc 3.1
>           -O                  -O2
> --------- ------------------- -------------------
> A000091                  1.00                1.00
> A000092                  1.00                0.99
> A000094a                 1.00                1.08
> A000094b                 1.00                0.87
> A000094c                 1.00                1.00
> 
> A000094d                 1.00                0.67
> A000094e                 1.00                0.67
> A000094f                 1.00                0.95
> A000094g                 1.00                0.98
> A000094h                 1.00                0.83
> 
> A000094i                 1.00                0.68
> A000094j                 1.00                0.96
> A000094k                 1.00                0.44
> B000002b                 1.00                1.02  *
> B000003b                 1.00                1.02  *
> 
> B000004b                 1.00                1.01  *
> B000010                  1.00                0.98
> B000011                  1.00                0.99
> B000013                  1.00                0.81
> D000001                  1.00                0.99
> 
> D000002                  1.00                1.01  *
> D000003                  1.00                0.99
> D000004                  1.00                1.01  *
> D000005                  1.00                1.16  *
> D000006                  1.00                1.00
> 
> E000001                  1.00                0.98
> E000002                  1.00                0.98
> E000003                  1.00                0.68
> E000004                  1.00                0.59
> E000007                  1.00                1.00
> 
> E000008                  1.00                1.07
> F000001                  1.00                1.44  *
> F000002                  1.00                1.44  *
> F000003                  1.00                0.37
> F000004                  1.00                0.55
> 
> F000005                  1.00                0.59
> F000006                  1.00                0.70
> F000007                  1.00                0.88
> F000008                  1.00                1.14  *
> G000001                  1.00                0.95
> 
> G000002                  1.00                0.99
> G000003                  1.00                1.05  *
> G000004                  1.00                1.01  *
> G000005                  1.00                0.92
> G000006                  1.00                1.05  *
> 
> G000007                  1.00                1.03  *
> H000001                  1.00                1.01  *
> H000002                  1.00                1.00
> H000003                  1.00                0.90
> H000004                  1.00                0.88
> 
> H000005                  1.00                0.00
> H000006                  1.00                0.88
> H000007                  1.00                0.99
> H000008                  1.00                0.76
> H000009                  1.00                0.99
> 
> L000001                  1.00                1.21  *
> L000002                  1.00                1.16  *
> L000003                  1.00                1.38  *
> L000004                  1.00                2.00  *
> O000001a                 1.00                0.82
> 
> O000001b                 1.00                0.83
> O000002a                 1.00                0.94
> O000002b                 1.00                1.00
> O000003a                 1.00                1.09  *
> O000003b                 1.00                0.96
> 
> O000004a                 1.00                1.05  *
> O000004b                 1.00                1.05  *
> O000005a                 1.00                1.01  *
> O000005b                 1.00                0.85
> O000006a                 1.00                0.94
> 
> O000006b                 1.00                0.86
> O000007a                 1.00                1.15  *
> O000007b                 1.00                1.26  *
> O000008a                 1.00                1.08  *
> O000008b                 1.00                0.95
> 
> O000009a                 1.00                0.84
> O000009b                 1.00                0.85
> O000010a                 1.00                0.99
> O000010b                 1.00                1.07  *
> O000011a                 1.00                0.93
> 
> O000011b                 1.00                1.05  *
> O000012a                 1.00                1.00
> O000012b                 1.00                0.99
> P000001                  1.00                0.61
> P000002                  1.00                0.72
> 
> P000003                  1.00                0.65
> P000004                  1.00                0.00
> P000005                  1.00                1.43  *
> P000006                  1.00                1.14  *
> P000007                  1.00                1.10
> 
> P000008                  1.00                1.21  *
> P000010                  1.00                0.76
> P000011                  1.00                0.98
> P000012                  1.00                1.11  *
> P000013                  1.00                1.00
> 
> P000020                  1.00                0.93
> P000021                  1.00                0.59
> P000022                  1.00                0.59
> P000023                  1.00                1.12  *
> S000001a                 1.00                0.93
> 
> S000001b                 1.00                1.17  *
> S000002a                 1.00                0.53
> S000002b                 1.00                0.59
> S000003a                 1.00                0.63
> S000003b                 1.00                1.02  *
> 
> S000004a                 1.00                1.47  *
> S000004b                 1.00                0.99
> S000005a                 1.00                1.03  *
> S000005b                 1.00                0.97
> S000005c                 1.00                1.03  *
> 
> S000005d                 1.00                0.97
> S000005e                 1.00                1.18  *
> S000005f                 1.00                0.98
> S000005g                 1.00                1.12  *
> S000005h                 1.00                0.98
> 
> S000005i                 1.00                1.03  *
> S000005j                 1.00                0.97
> S000005k                 1.00                1.01  *
> S000005l                 1.00                0.98
> S000005m                 1.00                1.00
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]