During recent performance testing I have identified a number of small code fragments where gcc 4.x produces worse code than gcc 3.4.3. Some of these may be target specific, and I plan to gradually enter such small performance regressions into the bug database unless there is a better way to report these. $ cat test.c long foo(long v) { return v & -v; } $ gcc-3.4.3 -O3 -c test.c && objdump -d test.o test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 48 89 f8 mov %rdi,%rax 3: 48 f7 d8 neg %rax 6: 48 21 f8 and %rdi,%rax 9: c3 retq $ gcc-4.1-20050516 -O3 -c test.c && objdump -d test.o test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 48 89 f8 mov %rdi,%rax 3: 48 f7 d8 neg %rax 6: 48 21 c7 and %rax,%rdi 9: 48 89 f8 mov %rdi,%rax c: c3 retq
The best way is to use Bugzilla, yes. Please use different bug reports for each code fragment, thanks!
Confirmed. This is just a RA issue, I don't know how much we can fix for 4.0.x.
3.3.5 also fails. I think this is also related to the message on the gcc mailing list recently: http://gcc.gnu.org/ml/gcc/2005-09/msg00429.html This is looks related to 2 operand targets.
Leaving as P2.
I am starting to think what 3.4.x did was just an accident that it got it right in the first place.
I just compiled the testcase on x86_64. I got foo: .LFB2: movq %rdi, %rax negq %rax andq %rdi, %rax ret which is as good as the assembly generated by 3.4.3. This is no longer a regression on 4.2.
GCC 4.1-20060107 still produces the code reported in the original bug report: 0000000000000000 <foo>: 0: 48 89 f8 mov %rdi,%rax 3: 48 f7 d8 neg %rax 6: 48 21 c7 and %rax,%rdi 9: 48 89 f8 mov %rdi,%rax c: c3 retq What patch may have fixed this on the trunk?
The new reassociation pass, or the removal of DOM's reassociation bits, fixed this on the trunk. We get poorer initial RTL generation out of GCC 4.1 and we never manage to fix it up: The .final_cleanup from GCC 4.1 and GCC 4.0: ;; Function foo (foo) foo (v) { <bb 0>: return v & -v; } And the .final_cleanup from GCC 4.2: ;; Function foo (foo) foo (v) { <bb 2>: return -v & v; } (insn 12 11 13 (parallel [ (set (reg:DI 60) - (and:DI (reg/v:DI 59 [ v ]) - (reg:DI 61))) + (and:DI (reg:DI 61) + (reg/v:DI 59 [ v ]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) So this regression is not caused by the register allocator, but it does play a role: In the .combine and .ce2 RTL dumps, the difference is still there: (insn 12 11 16 (parallel [ (insn 12 11 16 (parallel [ (set (reg:DI 60) - (and:DI (reg/v:DI 59 [ v ]) - (reg:DI 61))) + (and:DI (reg:DI 61) + (reg/v:DI 59 [ v ]))) (clobber (reg:CC 17 flags)) - (expr_list:REG_DEAD (reg/v:DI 59 [ v ]) - (expr_list:REG_DEAD (reg:DI 61) + (expr_list:REG_DEAD (reg:DI 61) + (expr_list:REG_DEAD (reg/v:DI 59 [ v ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) Then in the .regmove RTL dump something changes: (insn:HI 12 11 16 (parallel [ - (set (reg/v:DI 59 [ v ]) - (and:DI (reg/v:DI 59 [ v ]) - (reg:DI 61))) + (set (reg:DI 61) + (and:DI (reg:DI 61) + (reg/v:DI 59 [ v ]))) (clobber (reg:CC 17 flags)) ]) 297 {*anddi_1_rex64} (insn_list:REG_DEP_TRUE 11 (nil)) - (expr_list:REG_DEAD (reg:DI 61) + (expr_list:REG_DEAD (reg/v:DI 59 [ v ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) This small difference eventually leads to a different choice of register allocation. The choice that GCC 4.2 makes is superior because it makes the move to the result a dead instruction. The .greg RTL dump shows this: -(insn:HI 12 11 16 0 (parallel [ - (set (reg/v:DI 5 di [orig:59 v ] [59]) - (and:DI (reg/v:DI 5 di [orig:59 v ] [59]) - (reg:DI 0 ax [61]))) +(insn:HI 12 11 16 2 (parallel [ + (set (reg:DI 0 ax [61]) + (and:DI (reg:DI 0 ax [61]) + (reg/v:DI 5 di [orig:59 v ] [59]))) (clobber (reg:CC 17 flags)) ]) 297 {*anddi_1_rex64} (insn_list:REG_DEP_TRUE 11 (nil)) (nil)) -(insn:HI 19 16 25 0 (set (reg/i:DI 0 ax [ <result> ]) - (reg/v:DI 5 di [orig:59 v ] [59])) 81 {*movdi_1_rex64} - (insn_list:REG_DEP_TRUE 12 (nil)) - (nil))
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Fixed since 4.2.0, wontfix on older branches.