Summary: | loop invariants are not removed (most likely) | ||
---|---|---|---|
Product: | gcc | Reporter: | lomov1 |
Component: | rtl-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | gcc-bugs |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 3.2 | ||
Target Milestone: | 3.3.3 | ||
Host: | i686-pc-linux-gnu | Target: | i686-linux-pc-gnu |
Build: | i686-pc-linux-gnu | Known to work: | |
Known to fail: | Last reconfirmed: | 2003-11-15 20:47:31 | |
Attachments: | test.cpp |
Description
lomov1
2002-12-01 15:26:00 UTC
Fix: gcc-2.96 doesn't have this problem. Sorry, but I don't have access to other official (besides 3.1 and 3.2) releases Hello, your problem as stated is fixed on gcc 3.3 branch and mainline. Please note that -O9 is the same thing as -O3 (-On for n>3 is identical to -O3). Second, your code performs identically for me with -O3, -O3 -DSLOW1 and -O3 -DSLOW2 with gcc 3.3. With gcc mainline, things are somewhat more bizarre. With -O3, or -O3 -DSLOW1, the code is about 50% slower than with gcc 3.3, but with -O3 -DSLOW2 it's the same speed as gcc 3.3 (ie faster than the with the other two options). Very strange. Dara There is a preformance regression here from 3.3, other than that the -DSLOW[12] stuff is fixed though, they alll preform the same. Here is the difference in the code: --- temp.s Sat Dec 27 22:51:13 2003 +++ temp1.s Sat Dec 27 22:51:04 2003 @@ -5,21 +5,20 @@ .type main, @function main: pushl %ebp - xorl %edx, %edx + movl $999999999, %eax movl %esp, %ebp subl $40, %esp - leal -32(%ebp), %ecx - movl $0, -16(%ebp) + leal -32(%ebp), %edx andl $-16, %esp - movl $0, -20(%ebp) + movl $0, -16(%ebp) subl $16, %esp + movl $0, -20(%ebp) movl $0, -24(%ebp) - movl $999999999, %eax .p2align 4,,15 .L5: - movl $0, (%ecx,%edx,8) + movl $0, (%edx) decl %eax - movl $1072693248, 4(%ecx,%edx,8) + movl $1072693248, 4(%edx) jns .L5 leave xorl %eax, %eax The main difference is the use of (%ecx,%edx,8) vs (%edx) But this does not produce any difference in performance (at least on pentium 4 or pentium 3): tin:~/src/gnu/gcctest>time ./a.out 2.010u 0.000s 0:02.01 100.0% 0+0k 0+0io 69pf+0w tin:~/src/gnu/gcctest>gcc -std=c99 -O3 pr8776.c -DSLOW2 tin:~/src/gnu/gcctest>!tim time ./a.out 2.010u 0.000s 0:02.01 100.0% 0+0k 0+0io 69pf+0w tin:~/src/gnu/gcctest>gcc -std=c99 -O3 pr8776.c -DSLOW1 tin:~/src/gnu/gcctest>!time time ./a.out 2.010u 0.000s 0:02.01 100.0% 0+0k 0+0io 69pf+0w So this is fixed on the mainline and 3.3.3. |