[Bug target/12395] Suboptimal code with global variables
michaelni at gmx dot at
gcc-bugzilla@gcc.gnu.org
Sat Feb 11 11:40:00 GMT 2006
------- Comment #8 from michaelni at gmx dot at 2006-02-11 11:40 -------
I really think this should be fixed, otherwise gcc wont be able to follow its
exponential decaying performance which it has so accurately followed since 2.95
at least, to show clearer how much speed we could loose by fixing this i was
nice and benchmarked the code (a simple for loop running 100 times with the
code inside, rdtsc based timing outside with a 1000 times executed loop
surounding it
benchmarink was done on a 800mhz duron and a 500mhz pentium3, the first number
is the number of cpu cycles for the duron, second one for p3
first let me show you the optimal code by steven boscher?
"addl $1,a\n"
" je .L1\n"
"addl $1,a\n"
".L1:\n"
11.557 / 12.514
now what gcc 3.4/3.2 generated:
"movl a, %%eax\n"
"incl %%eax\n"
"testl %%eax, %%eax\n"
"movl %%eax, a\n"
"je .L1\n"
"incl %%eax\n"
"movl %%eax, a\n"
".L1:\n"
//6.220 / 6.159
the code generated by mainline had 2 ret so it didnt fit in my benchmark loop
the even better code by segher AT d12relay01 DOT megacenter.de.ibm.com
"addl $1,a\n"
"sbbl $-1,a\n"
//11.755 / 15.111
one case which you must be carefull not to generate as its almost twice as fast
as the on above while still being just 2 instructions is:
"cmpl $-1,a\n"
"adcl $1,a\n"
//7.827 / 7.422
another 2 slightly faster variants are:
"movl a, %%eax\n"
"cmpl $-1,%%eax\n"
"adcl $1,%%eax\n"
"movl %%eax,a\n"
//6.567 / 8.811
"movl a, %%eax\n"
"addl $1,%%eax\n"
"sbbl $-1,%%eax\n"
"movl %%eax,a\n"
//6.564 / 8.813
what a 14year old script kid would write and what gcc would generate if it
where local variables:
"movl a, %%eax\n"
"incl %%eax\n"
"je .L1\n"
"incl %%eax\n"
".L1:\n"
"movl %%eax, a\n"
//6.162 / 5.426
what i would write (as the variable isnt used in my testcase):
"\n"
//2.155 / 2.410
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395
More information about the Gcc-bugs
mailing list