This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

             Bug #: 51017
           Summary: GCC 4.6 performance regression (vs. 4.4/4.5)
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: solar-gcc@openwall.com


GCC 4.6 happens to produce approx. 25% slower code on at least x86_64 than 4.4
and 4.5 did for John the Ripper 1.7.8's bitslice DES implementation.  To
reproduce, download
http://download.openwall.net/pub/projects/john/1.7.8/john-1.7.8.tar.bz2 and
build it with "make linux-x86-64" (will use SSE2 intrinsics), "make
linux-x86-64-avx" (will use AVX instead), or "make generic" (won't use any
intrinsics).  Then run "../run/john -te=1".  With GCC 4.4 and 4.5, the
"Traditional DES" benchmark reports a speed of around 2500K c/s for the
"linux-x86-64" (SSE2) build on a 2.33 GHz Core 2 (this is using one core). 
With 4.6, this drops to about 1850K c/s.  Similar slowdown was observed for AVX
on Core i7-2600K when going from GCC 4.5.x to 4.6.x.  And it is reproducible
for the without-intrinsics code as well, although that's of less practical
importance (the intrinsics are so much faster).  Similar slowdown with GCC 4.6
was reported by a Mac OS X user.  It was also spotted by Phoronix in their
recently published C compiler benchmarks, but misinterpreted as a GCC vs. clang
difference.

Adding "-Os" to OPT_INLINE in the Makefile partially corrects the performance
(to something like 2000K c/s - still 20% slower than GCC 4.4/4.5's).  Applying
the OpenMP patch from
http://download.openwall.net/pub/projects/john/1.7.8/john-1.7.8-omp-des-4.diff.gz
and then running with OMP_NUM_THREADS=1 (for a fair comparison) corrects the
performance almost fully.  Keeping the patch applied, but removing -fopenmp
still keeps the performance at a good level.  So it's some change made to the
source code by this patch that mitigates the GCC regression.  Similar behavior
is seen with current CVS version of John the Ripper, even though it has OpenMP
support for DES heavily revised and integrated into the tree.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]