Bug 8776

Summary: loop invariants are not removed (most likely)
Product: gcc Reporter: lomov1
Component: rtl-optimizationAssignee: Not yet assigned to anyone <unassigned>
Severity: normal CC: gcc-bugs
Priority: P3 Keywords: missed-optimization
Version: 3.2   
Target Milestone: 3.3.3   
Host: i686-pc-linux-gnu Target: i686-linux-pc-gnu
Build: i686-pc-linux-gnu Known to work:
Known to fail: Last reconfirmed: 2003-11-15 20:47:31
Attachments: test.cpp

Description lomov1 2002-12-01 15:26:00 UTC
There is a following program:
struct str{
   int f[3];

int main (int argc, char *argv[])
   double d[1];
   double* pd=d;
   str s;
   int* pf=&s.f[0];
   for (int i=0;i<1000000000;++i) {
#ifdef SLOW1
#elif defined(SLOW2)
after compilation with gcc-3.2 -O9 on Linux Mandrake 9.0 Athlon 900MHz,

time ./a.out of the SLOW[12] version gives >10s
time ./a.out of the "!defined(SLOW[12])" version gives 3.50s

Logically this is the same thing...

gcc version 3.2 (Mandrake Linux 9.0 3.2-1mdk); also with gcc-3.1

Linux 2.4.19-16mdk i686 GNU/Linux
Mandrake 9.0
AMD Athlon(tm) Processor stepping 2 cpu MHz 908.111

gcc -O9 test.cpp ; time ./a.out
gcc -O9 -DSLOW1 test.cpp ; time ./a.out
gcc -O9 -DSLOW2 test.cpp ; time ./a.out
Comment 1 lomov1 2002-12-01 15:26:00 UTC
gcc-2.96 doesn't have this problem. Sorry, but I don't have access to other official (besides 3.1 and 3.2) releases
Comment 2 Dara Hazeghi 2003-06-04 19:50:50 UTC

your problem as stated is fixed on gcc 3.3 branch and mainline. Please note that -O9 is the same 
thing as -O3 (-On for n>3 is identical to -O3). Second, your code performs identically for me with 
-O3, -O3 -DSLOW1 and -O3 -DSLOW2 with gcc 3.3. With gcc mainline, things are somewhat more 
bizarre. With -O3, or -O3 -DSLOW1, the code is about 50% slower than with gcc 3.3, but with -O3 
-DSLOW2 it's the same speed as gcc 3.3 (ie faster than the with the other two options). Very 

Comment 3 Andrew Pinski 2003-11-15 20:47:30 UTC
There is a preformance regression here from 3.3, other than that the -DSLOW[12] stuff is 
fixed though, they alll preform the same.
Comment 4 Andrew Pinski 2003-12-28 03:56:00 UTC
Here is the difference in the code:
--- temp.s      Sat Dec 27 22:51:13 2003
+++ temp1.s     Sat Dec 27 22:51:04 2003
@@ -5,21 +5,20 @@
        .type   main, @function
        pushl   %ebp
-       xorl    %edx, %edx
+       movl    $999999999, %eax
        movl    %esp, %ebp
        subl    $40, %esp
-       leal    -32(%ebp), %ecx
-       movl    $0, -16(%ebp)
+       leal    -32(%ebp), %edx
        andl    $-16, %esp
-       movl    $0, -20(%ebp)
+       movl    $0, -16(%ebp)
        subl    $16, %esp
+       movl    $0, -20(%ebp)
        movl    $0, -24(%ebp)
-       movl    $999999999, %eax
        .p2align 4,,15
-       movl    $0, (%ecx,%edx,8)
+       movl    $0, (%edx)
        decl    %eax
-       movl    $1072693248, 4(%ecx,%edx,8)
+       movl    $1072693248, 4(%edx)
        jns     .L5
        xorl    %eax, %eax

The main difference is the use of (%ecx,%edx,8) vs  (%edx)
But this does not produce any difference in performance (at least on pentium 4 or 
pentium 3):

tin:~/src/gnu/gcctest>time ./a.out
2.010u 0.000s 0:02.01 100.0%    0+0k 0+0io 69pf+0w
tin:~/src/gnu/gcctest>gcc -std=c99 -O3 pr8776.c -DSLOW2
time ./a.out
2.010u 0.000s 0:02.01 100.0%    0+0k 0+0io 69pf+0w
tin:~/src/gnu/gcctest>gcc -std=c99 -O3 pr8776.c -DSLOW1
time ./a.out
2.010u 0.000s 0:02.01 100.0%    0+0k 0+0io 69pf+0w

So this is fixed on the mainline and 3.3.3.