Bug 21715 - [4.0/4.1 regression] code-generation performance regression
Summary: [4.0/4.1 regression] code-generation performance regression
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P2 minor
Target Milestone: 4.2.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on: 18427
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-23 08:24 UTC by Markus F.X.J. Oberhumer
Modified: 2008-02-20 22:35 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 3.4.3 4.2.0
Known to fail: 4.0.0 4.1.0 4.1.3 3.3.5 3.3.6
Last reconfirmed: 2006-01-10 18:06:06


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Markus F.X.J. Oberhumer 2005-05-23 08:25:00 UTC
During recent performance testing I have identified a number of small code
fragments where gcc 4.x produces worse code than gcc 3.4.3. Some of these may be
target specific, and I plan to gradually enter such small performance
regressions into the bug database unless there is a better way to report these.


$ cat test.c
long foo(long v) { return v & -v; }

$ gcc-3.4.3 -O3 -c test.c && objdump -d test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0:   48 89 f8                mov    %rdi,%rax
   3:   48 f7 d8                neg    %rax
   6:   48 21 f8                and    %rdi,%rax
   9:   c3                      retq

$ gcc-4.1-20050516 -O3 -c test.c && objdump -d test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0:   48 89 f8                mov    %rdi,%rax
   3:   48 f7 d8                neg    %rax
   6:   48 21 c7                and    %rax,%rdi
   9:   48 89 f8                mov    %rdi,%rax
   c:   c3                      retq
Comment 1 Giovanni Bajo 2005-05-23 12:23:52 UTC
The best way is to use Bugzilla, yes. Please use different bug reports for each 
code fragment, thanks!
Comment 2 Andrew Pinski 2005-05-23 18:17:00 UTC
Confirmed.

This is just a RA issue, I don't know how much we can fix for 4.0.x.
Comment 3 Andrew Pinski 2005-09-16 19:04:10 UTC
3.3.5 also fails.  I think this is also related to the message on the gcc mailing list recently:
http://gcc.gnu.org/ml/gcc/2005-09/msg00429.html

This is looks related to 2 operand targets.
Comment 4 Mark Mitchell 2005-10-31 03:40:22 UTC
Leaving as P2.
Comment 5 Andrew Pinski 2005-11-06 16:35:39 UTC
I am starting to think what 3.4.x did was just an accident that it got it right in the first place.
Comment 6 Kazu Hirata 2005-12-19 02:30:01 UTC
I just compiled the testcase on x86_64.  I got

foo:
.LFB2:
        movq    %rdi, %rax
        negq    %rax
        andq    %rdi, %rax
        ret

which is as good as the assembly generated by 3.4.3.

This is no longer a regression on 4.2.
Comment 7 Steven Bosscher 2006-01-07 18:41:59 UTC
GCC 4.1-20060107 still produces the code reported in the original bug report:

0000000000000000 <foo>:
   0:   48 89 f8                mov    %rdi,%rax
   3:   48 f7 d8                neg    %rax
   6:   48 21 c7                and    %rax,%rdi
   9:   48 89 f8                mov    %rdi,%rax
   c:   c3                      retq

What patch may have fixed this on the trunk?
Comment 8 Steven Bosscher 2006-01-10 17:50:02 UTC
The new reassociation pass, or the removal of DOM's reassociation bits, fixed this on the trunk.  We get poorer initial RTL generation out of GCC 4.1 and we never manage to fix it up:

The .final_cleanup from GCC 4.1 and GCC 4.0:
;; Function foo (foo)

foo (v)
{
<bb 0>:
  return v & -v;

}


And the .final_cleanup from GCC 4.2:
;; Function foo (foo)

foo (v)
{
<bb 2>:
  return -v & v;

}


 (insn 12 11 13 (parallel [
             (set (reg:DI 60)
-                (and:DI (reg/v:DI 59 [ v ])
-                    (reg:DI 61)))
+                (and:DI (reg:DI 61)
+                    (reg/v:DI 59 [ v ])))
             (clobber (reg:CC 17 flags))
         ]) -1 (nil)
     (nil))

So this regression is not caused by the register allocator, but it does play a role:

In the .combine and .ce2 RTL dumps, the difference is still there:
(insn 12 11 16 (parallel [
(insn 12 11 16 (parallel [
             (set (reg:DI 60)
-                (and:DI (reg/v:DI 59 [ v ])
-                    (reg:DI 61)))
+                (and:DI (reg:DI 61)
+                    (reg/v:DI 59 [ v ])))
             (clobber (reg:CC 17 flags))
-    (expr_list:REG_DEAD (reg/v:DI 59 [ v ])
-        (expr_list:REG_DEAD (reg:DI 61)
+    (expr_list:REG_DEAD (reg:DI 61)
+        (expr_list:REG_DEAD (reg/v:DI 59 [ v ])
             (expr_list:REG_UNUSED (reg:CC 17 flags)
                 (nil)))))

Then in the .regmove RTL dump something changes:
(insn:HI 12 11 16 (parallel [
-            (set (reg/v:DI 59 [ v ])
-                (and:DI (reg/v:DI 59 [ v ])
-                    (reg:DI 61)))
+            (set (reg:DI 61)
+                (and:DI (reg:DI 61)
+                    (reg/v:DI 59 [ v ])))
             (clobber (reg:CC 17 flags))
         ]) 297 {*anddi_1_rex64} (insn_list:REG_DEP_TRUE 11 (nil))
-    (expr_list:REG_DEAD (reg:DI 61)
+    (expr_list:REG_DEAD (reg/v:DI 59 [ v ])
         (expr_list:REG_UNUSED (reg:CC 17 flags)
             (nil))))

This small difference eventually leads to a different choice of register allocation.  The choice that GCC 4.2 makes is superior because it makes the move to the result a dead instruction.  The .greg RTL dump shows this:

-(insn:HI 12 11 16 0 (parallel [
-            (set (reg/v:DI 5 di [orig:59 v ] [59])
-                (and:DI (reg/v:DI 5 di [orig:59 v ] [59])
-                    (reg:DI 0 ax [61])))
+(insn:HI 12 11 16 2 (parallel [
+            (set (reg:DI 0 ax [61])
+                (and:DI (reg:DI 0 ax [61])
+                    (reg/v:DI 5 di [orig:59 v ] [59])))
             (clobber (reg:CC 17 flags))
         ]) 297 {*anddi_1_rex64} (insn_list:REG_DEP_TRUE 11 (nil))
     (nil))

-(insn:HI 19 16 25 0 (set (reg/i:DI 0 ax [ <result> ])
-        (reg/v:DI 5 di [orig:59 v ] [59])) 81 {*movdi_1_rex64} 
-    (insn_list:REG_DEP_TRUE 12 (nil))
-    (nil))
Comment 9 Mark Mitchell 2006-02-24 00:25:56 UTC
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Comment 10 Mark Mitchell 2006-05-25 02:33:00 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 11 Richard Biener 2008-02-20 22:35:28 UTC
Fixed since 4.2.0, wontfix on older branches.