This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/47000] Major performance regression in parallel SSE2 impl of SHA256 hash algorithm


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2010.12.18 12:39:26
     Ever Confirmed|0                           |1

--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2010-12-18 12:39:26 UTC ---
Compiled like so:
$ gcc-4.4.2 -S -O2 sha256_4way.i -o sha256_4way-44.s
$ gcc-4.5.0 -S -O2 sha256_4way.i -o sha256_4way-45.s

$ grep -c call *.s
sha256_4way-44.s:0
sha256_4way-45.s:484
$ grep call *.s|head
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
sha256_4way-45.s:    call    ROTR
$ 

ROTR should have been inlined:

static inline __m128i ROTR(__m128i x, const int n) {
    return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n);
}

This probably explains the slowdown.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]