Bug 8776

Summary:	loop invariants are not removed (most likely)
Product:	gcc	Reporter:	lomov1
Component:	rtl-optimization	Assignee:	Not yet assigned to anyone <unassigned>
Status:	RESOLVED FIXED
Severity:	normal	CC:	gcc-bugs
Priority:	P3	Keywords:	missed-optimization
Version:	3.2
Target Milestone:	3.3.3
Host:	i686-pc-linux-gnu	Target:	i686-linux-pc-gnu
Build:	i686-pc-linux-gnu	Known to work:
Known to fail:		Last reconfirmed:	2003-11-15 20:47:31
Attachments:	test.cpp

Description lomov1 2002-12-01 15:26:00 UTC

There is a following program:
------
struct str{
   int f[3];
};

int main (int argc, char *argv[])
{
   double d[1];
   double* pd=d;
   str s;
   int* pf=&s.f[0];
   s.f[0]=s.f[1]=s.f[2]=0;
   for (int i=0;i<1000000000;++i) {
#ifdef SLOW1
      pd[s.f[0]+s.f[1]*s.f[2]]=1;
#elif defined(SLOW2)
      d[pf[0]+pf[1]*pf[2]]=1;
#else
      pd[pf[0]+pf[1]*pf[2]]=1;
#endif
   }
}
------
after compilation with gcc-3.2 -O9 on Linux Mandrake 9.0 Athlon 900MHz,

time ./a.out of the SLOW[12] version gives >10s
time ./a.out of the "!defined(SLOW[12])" version gives 3.50s

Logically this is the same thing...

Release:
gcc version 3.2 (Mandrake Linux 9.0 3.2-1mdk); also with gcc-3.1

Environment:
Linux 2.4.19-16mdk i686 GNU/Linux
Mandrake 9.0
AMD Athlon(tm) Processor stepping 2 cpu MHz 908.111

How-To-Repeat:
gcc -O9 test.cpp ; time ./a.out
gcc -O9 -DSLOW1 test.cpp ; time ./a.out
gcc -O9 -DSLOW2 test.cpp ; time ./a.out

Comment 1 lomov1 2002-12-01 15:26:00 UTC

Fix:
gcc-2.96 doesn't have this problem. Sorry, but I don't have access to other official (besides 3.1 and 3.2) releases

Comment 2 Dara Hazeghi 2003-06-04 19:50:50 UTC

Hello,

your problem as stated is fixed on gcc 3.3 branch and mainline. Please note that -O9 is the same 
thing as -O3 (-On for n>3 is identical to -O3). Second, your code performs identically for me with 
-O3, -O3 -DSLOW1 and -O3 -DSLOW2 with gcc 3.3. With gcc mainline, things are somewhat more 
bizarre. With -O3, or -O3 -DSLOW1, the code is about 50% slower than with gcc 3.3, but with -O3 
-DSLOW2 it's the same speed as gcc 3.3 (ie faster than the with the other two options). Very 
strange.

Dara

Comment 3 Andrew Pinski 2003-11-15 20:47:30 UTC

There is a preformance regression here from 3.3, other than that the -DSLOW[12] stuff is 
fixed though, they alll preform the same.

Comment 4 Andrew Pinski 2003-12-28 03:56:00 UTC

Here is the difference in the code:
--- temp.s      Sat Dec 27 22:51:13 2003
+++ temp1.s     Sat Dec 27 22:51:04 2003
@@ -5,21 +5,20 @@
        .type   main, @function
 main:
        pushl   %ebp
-       xorl    %edx, %edx
+       movl    $999999999, %eax
        movl    %esp, %ebp
        subl    $40, %esp
-       leal    -32(%ebp), %ecx
-       movl    $0, -16(%ebp)
+       leal    -32(%ebp), %edx
        andl    $-16, %esp
-       movl    $0, -20(%ebp)
+       movl    $0, -16(%ebp)
        subl    $16, %esp
+       movl    $0, -20(%ebp)
        movl    $0, -24(%ebp)
-       movl    $999999999, %eax
        .p2align 4,,15
 .L5:
-       movl    $0, (%ecx,%edx,8)
+       movl    $0, (%edx)
        decl    %eax
-       movl    $1072693248, 4(%ecx,%edx,8)
+       movl    $1072693248, 4(%edx)
        jns     .L5
        leave
        xorl    %eax, %eax

The main difference is the use of (%ecx,%edx,8) vs  (%edx)
But this does not produce any difference in performance (at least on pentium 4 or 
pentium 3):

tin:~/src/gnu/gcctest>time ./a.out
2.010u 0.000s 0:02.01 100.0%    0+0k 0+0io 69pf+0w
tin:~/src/gnu/gcctest>gcc -std=c99 -O3 pr8776.c -DSLOW2
tin:~/src/gnu/gcctest>!tim
time ./a.out
2.010u 0.000s 0:02.01 100.0%    0+0k 0+0io 69pf+0w
tin:~/src/gnu/gcctest>gcc -std=c99 -O3 pr8776.c -DSLOW1
tin:~/src/gnu/gcctest>!time
time ./a.out
2.010u 0.000s 0:02.01 100.0%    0+0k 0+0io 69pf+0w

So this is fixed on the mainline and 3.3.3.