This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/51017] [4.9/5/6 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 4.5 vs GCC 5 still shows GCC 4.5 is faster almost everywhere

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE      Benchmarking:
Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     3636K c/s real, 3636K c/s virtual             | Many salts:    
3488K c/s real, 3488K c/s virtual
Only one salt:  3047K c/s real, 3047K c/s virtual             | Only one salt: 
2896K c/s real, 2896K c/s virtual

Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE      Benchmarking:
BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     127360 c/s real, 127360 c/s virtual           | Many salts:    
108800 c/s real, 108800 c/s virtual
Only one salt:  124288 c/s real, 123057 c/s virtual           | Only one salt: 
106112 c/s real, 106112 c/s virtual

Benchmarking: FreeBSD MD5 [32/64 X2]... DONE                    Benchmarking:
FreeBSD MD5 [32/64 X2]... DONE
Raw:    15392 c/s real, 15392 c/s virtual                     | Raw:    15936
c/s real, 15936 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE         Benchmarking:
OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    900 c/s real, 900 c/s virtual                         | Raw:    892 c/s
real, 892 c/s virtual

Benchmarking: Kerberos AFS DES [48/64 4K]... DONE               Benchmarking:
Kerberos AFS DES [48/64 4K]... DONE
Short:  478208 c/s real, 473473 c/s virtual                   | Short:  476672
c/s real, 476672 c/s virtual
Long:   1470K c/s real, 1470K c/s virtual                     | Long:   1473K
c/s real, 1473K c/s virtual

Benchmarking: LM DES [128/128 BS SSE2-16]... DONE               Benchmarking:
LM DES [128/128 BS SSE2-16]... DONE
Raw:    16977K c/s real, 16977K c/s virtual                   | Raw:    14971K
c/s real, 14971K c/s virtual

Benchmarking: generic crypt(3) [?/64]... DONE                   Benchmarking:
generic crypt(3) [?/64]... DONE
Many salts:     362784 c/s real, 362784 c/s virtual           | Many salts:    
296352 c/s real, 296352 c/s virtual
Only one salt:  361728 c/s real, 361728 c/s virtual           | Only one salt: 
292182 c/s real, 295104 c/s virtual

Benchmarking: dummy [N/A]... DONE                               Benchmarking:
dummy [N/A]... DONE
Raw:    60157K c/s real, 60157K c/s virtual                   | Raw:    53849K
c/s real, 53316K c/s virtual


GCC 5 vs. GCC 6 shows some progress (and some small regressions), but not for
BSDI DES.

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE      Benchmarking:
Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     3488K c/s real, 3488K c/s virtual             | Many salts:    
3446K c/s real, 3446K c/s virtual
Only one salt:  2896K c/s real, 2896K c/s virtual             | Only one salt: 
2895K c/s real, 2895K c/s virtual

Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE      Benchmarking:
BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     108800 c/s real, 108800 c/s virtual           | Many salts:    
104934 c/s real, 105984 c/s virtual
Only one salt:  106112 c/s real, 106112 c/s virtual           | Only one salt: 
103040 c/s real, 103040 c/s virtual

Benchmarking: FreeBSD MD5 [32/64 X2]... DONE                    Benchmarking:
FreeBSD MD5 [32/64 X2]... DONE
Raw:    15936 c/s real, 15936 c/s virtual                     | Raw:    15864
c/s real, 15864 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE         Benchmarking:
OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    892 c/s real, 892 c/s virtual                         | Raw:    916 c/s
real, 916 c/s virtual

Benchmarking: Kerberos AFS DES [48/64 4K]... DONE               Benchmarking:
Kerberos AFS DES [48/64 4K]... DONE
Short:  476672 c/s real, 476672 c/s virtual                   | Short:  471808
c/s real, 471808 c/s virtual
Long:   1473K c/s real, 1473K c/s virtual                     | Long:   1449K
c/s real, 1449K c/s virtual

Benchmarking: LM DES [128/128 BS SSE2-16]... DONE               Benchmarking:
LM DES [128/128 BS SSE2-16]... DONE
Raw:    14971K c/s real, 14971K c/s virtual                   | Raw:    15917K
c/s real, 15917K c/s virtual

Benchmarking: generic crypt(3) [?/64]... DONE                   Benchmarking:
generic crypt(3) [?/64]... DONE
Many salts:     296352 c/s real, 296352 c/s virtual           | Many salts:    
348096 c/s real, 348096 c/s virtual
Only one salt:  292182 c/s real, 295104 c/s virtual           | Only one salt: 
347616 c/s real, 347616 c/s virtual

Benchmarking: dummy [N/A]... DONE                               Benchmarking:
dummy [N/A]... DONE
Raw:    53849K c/s real, 53316K c/s virtual                   | Raw:    60114K
c/s real, 60114K c/s virtual


Note that -fno-tree-pre no longer helps.  With GCC 5/6 most intrinsics are
using a generic implementation and thus are transparent to the GIMPLE
middle-end
apart from __builtin_ia32_pandn128 which is used by _mm_andnot_si128.
What helps is -fno-tree-loop-im in addition to -fno-tree-pre so the
underlying issue is still that of register pressure it seems and it is
not really the loop-carried stuff we introduce but the excessive invariant
motion.

Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     118528 c/s real, 117354 c/s virtual
Only one salt:  114944 c/s real, 114944 c/s virtual

        movq    DES_bs_all+18632(%rip), %rdi
        movq    DES_bs_all+18624(%rip), %rcx
        movq    DES_bs_all+18712(%rip), %rbp
        movq    DES_bs_all+18696(%rip), %r9
        movq    DES_bs_all+18688(%rip), %r10
        movq    DES_bs_all+18680(%rip), %r11
        movq    %rdi, 624(%rsp)
        movq    %rcx, 616(%rsp)
        movq    %rbp, 320(%rsp)

etc. - of course quite stupid to load sth and then spill it immediately...

We're also back to all unaligned loads/stores.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]