38922 – [4.3 Regression] Optimization regression in simple conditional code (js instead of cmov) 4.3 vs 4.1 and 3.4

Bug 38922 - [4.3 Regression] Optimization regression in simple conditional code (js instead of cmov) 4.3 vs 4.1 and 3.4

Summary: [4.3 Regression] Optimization regression in simple conditional code (js inste...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.3.2

Importance:	P2 normal
Target Milestone:	4.4.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2009-01-20 08:44 UTC by vincenzo Innocente
Modified:	2009-04-22 15:21 UTC (History)
CC List:	1 user (show)

See Also:
Host:	x86_64-redhat-linux
Target:	x86_64-redhat-linux
Build:	x86_64-redhat-linux
Known to work:	3.4.6 4.4.0
Known to fail:	4.3.2 4.3.3
Last reconfirmed:

Attachments
test case (1.77 KB, text/plain) 2009-01-20 08:48 UTC, vincenzo Innocente	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description vincenzo Innocente 2009-01-20 08:44:39 UTC

I discovered that a simple benchmark ("SCIMARK2 Montecarlo") runs tree times slower when compiled with gcc 4.3 w.r.t. 4.1 or 3.4
Code is compiled and run of INTEL core 2 machines running RHEL4, RHEL5 or fedora10.
below details on fedora 10
compilers used are from fedora distribution
-bash-3.2$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC)

-bash-3.2$ gcc34 -v
Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,f77 --disable-libgcj --host=x86_64-redhat-linux
Thread model: posix
gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)


I've extracted the code in a self contained source downloadable from
wget http://innocent.home.cern.ch/innocent/fullMC.c
results are
-bash-3.2$ g++ -O3 fullMC.c ; time ./a.out 

real	0m1.731s
user	0m1.730s
sys	0m0.001s
-bash-3.2$ g++34 -O3 fullMC.c ; time ./a.out 

real	0m0.547s
user	0m0.546s
sys	0m0.001s


in my opinion the culprit is a wrong use of jump instead of cmov instruction here:

this is the disassember emitted by 4.3

  int I = R->i;
 400510:       8b 4f 48                mov    0x48(%rdi),%ecx
   int J = R->j;
 400513:       8b 77 4c                mov    0x4c(%rdi),%esi
   int *m = R->m;

   k = m[I] - m[J];
 400516:       48 63 c1                movslq %ecx,%rax
 400519:       48 63 d6                movslq %esi,%rdx
 40051c:       8b 04 87                mov    (%rdi,%rax,4),%eax
   if (k < 0) k += m1;
 40051f:       41 89 c0                mov    %eax,%r8d
 400522:       44 2b 04 97             sub    (%rdi,%rdx,4),%r8d
 400526:       78 58                   js     400580 <Random_nextDouble+0x70>
   R->m[J] = k;


and this for 3.4

   int I = R->i;
 400660:       8b 47 48                mov    0x48(%rdi),%eax
   int J = R->j;
 400663:       8b 57 4c                mov    0x4c(%rdi),%edx
   int *m = R->m;

   k = m[I] - m[J];
 400666:       48 63 c8                movslq %eax,%rcx
 400669:       48 63 f2                movslq %edx,%rsi
 40066c:       44 8b 04 8f             mov    (%rdi,%rcx,4),%r8d
 400670:       44 2b 04 b7             sub    (%rdi,%rsi,4),%r8d
   if (k < 0) k += m1;
 400674:       41 8d 88 ff ff ff 7f    lea    0x7fffffff(%r8),%ecx
 40067b:       41 83 f8 ff             cmp    $0xffffffffffffffff,%r8d
 40067f:       44 0f 4e c1             cmovle %ecx,%r8d
   R->m[J] = k;
-------------------------------------

gcc 4.1 (below specs from RHL5) produces same instructions than 3.4

 gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)

Comment 1 vincenzo Innocente 2009-01-20 08:48:22 UTC

Created attachment 17152 [details]
test case

Comment 2 Richard Biener 2009-01-20 09:04:34 UTC

4.4.0 is faster for me than 4.2 and 4.3 (4.3 is indeed slower than 4.2, but
my 3.4 (32bit only) is way slower than 4.4 (also 32bit)).

Note that performance of cmov heavily depends on the microarchitecture of your
CPU (I measured on a AMD K8).

Comment 3 vincenzo Innocente 2009-01-20 09:24:28 UTC

I confirm that gcc 4.2.3 is as fast as 4.1 and at least twice as slow of gcc 4.3.2
test done on an intel core2 running RHL4 and core i7 with RHL5.
mtune either generic or native (no difference)

Comment 4 Richard Biener 2009-01-24 10:21:14 UTC

GCC 4.3.3 is being released, adjusting target milestone.

Comment 5 Richard Biener 2009-04-22 15:21:30 UTC

WONTFIX on the 4.3 branch.