register int *reg __asm__("%edi"); int test () { return *--reg <= 0; } With -O2 -fomit-frame-pointer, 4.0, movl %edi, %eax leal -4(%edi), %edi movl -4(%eax), %eax testl %eax, %eax With 3.4, subl $4, %edi cmpl $0, (%edi) The problem appears to begin at the tree level, with extra temporaries: reg.0 = reg; reg.2 = reg.0 - 4B; reg = reg.2; return *reg.2 <= 0; We do consider hard register variables not is_gimple_reg, due to needing to V_MAY_DEF them at call sites. It would be nice if we could eliminate these temporaries during TER, or something.
Confirmed.
Leaving this as P2.
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
It is slightly different now: leal -4(%edi), %eax movl %eax, %edi movl (%eax), %eax testl %eax, %eax But still the same issue.
In the greg dump we have this RTL: (insn:HI 10 8 11 2 (parallel [ (set (reg:SI 58 [ D.1540 ]) (plus:SI (reg/v:SI 5 di [ reg ]) (const_int -4 [0xfffffffffffffffc]))) (clobber (reg:CC 17 flags)) ]) 208 {*addsi_1} (nil) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg/v:SI 5 di [ reg ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) (insn:HI 11 10 12 2 (set (reg/v:SI 5 di [ reg ]) (reg:SI 58 [ D.1540 ])) 40 {*movsi_1} (insn_list:REG_DEP_TRUE 10 (nil)) (nil)) (insn:HI 12 11 13 2 (set (reg:CCNO 17 flags) (compare:CCNO (mem:SI (reg:SI 58 [ D.1540 ]) [3 S4 A32]) (const_int 0 [0x0]))) 3 {*cmpsi_ccno_1} (nil) (expr_list:REG_DEAD (reg:SI 58 [ D.1540 ]) (nil))) reg 5 and pseudoreg 58 can share the same hard register (i.e. 58 renumbers to 5) but GCC concludes that the two regs conflict.
Created attachment 13149 [details] proposed patch for 4.3 This patch removes one of the temporary copies. With this minor tuning of one of TERs heuristics, the tree optimizers produce: reg.27 = reg - 4B; reg = reg.27; return *reg.27 <= 0; Getting rid of the remaining middle copy is slightly little tricker, because it involves a VDEF. On mainline, this produces the (I think) desired assembly: subl $4, %edi xorl %eax, %eax cmpl $0, -4(%edi) setle %al ret
Created attachment 13150 [details] proposed patch for 4.2 This is the same patch for the 4.2 compiler. Unfortunately, its not quite good enough because the rtl optimizers still manage to do the wrong thing. In mainline, life recognizes that the register is dead in the copy: (insn 7 5 8 2 (parallel [ (set (reg:SI 58 [ reg.27 ]) (plus:SI (reg/v:SI 5 di [ reg ]) (const_int -4 [0xfffffffc]))) (clobber (reg:CC 17 flags)) ]) 148 {*addsi_1} (nil) (expr_list:REG_DEAD (reg/v:SI 5 di [ reg ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 8 7 9 2 (set (reg/v:SI 5 di [ reg ]) (reg:SI 58 [ reg.27 ])) 34 {*movsi_1} (insn_list:REG_DEP_TRUE 7 (nil)) (expr_list:REG_DEAD (reg:SI 58 [ reg.27 ]) (nil))) and combine turns it into: (insn 8 7 9 2 (parallel [ (set (reg/v:SI 5 di [ reg ]) (plus:SI (reg/v:SI 5 di [ reg ]) (const_int -4 [0xfffffffc]))) (clobber (reg:CC 17 flags)) ]) 148 {*addsi_1} (nil) on the 4.2 branch, we don't seem to get it right and combine, nor anyone else I suppose, manages to get merge the 2 insns. so we end up aith the same assembly. Unless someone sees something in the RTL optimizers that can be tweaked that can figure this out, there isn't much point in applying this to 4.2. Im not planning to look into the RTL side myself, but I will see if there is anything else TER can do to get rid of this situation in 4.2.
actually, mainline isn't working either. A closer examination shows the code generated has an extra offset of -4 in the compare that shouldn't be there. This patch triggers a bug in rtl's fwprop pass.
Unfortunately, if I fix the fwprop bug (which is actually caused by wrong df information), I get again leal -4(%edi), %eax movl %eax, %edi movl (%eax), %eax testl %eax, %eax The df bug is fixed by: Index: ../../base-gcc-src/gcc/df-scan.c =================================================================== --- ../../base-gcc-src/gcc/df-scan.c (revision 122624) +++ ../../base-gcc-src/gcc/df-scan.c (working copy) @@ -1833,6 +1833,13 @@ df_record_entry_block_defs (struct dataf #endif } + /* Mark all global registers as being defined at the entry of the + function since values set by our caller should not be treated as + uninitialized. */ + for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) + if (global_regs[i]) + bitmap_set_bit (df->entry_block_defs, i); + /* Once the prologue has been generated, all of these registers should just show up in the first regular block. */ if (HAVE_prologue && epilogue_completed)
In this message: http://gcc.gnu.org/ml/gcc/2007-03/msg00249.html Andre Macleod indicates that this will be difficult to fix in pre-4.3 compilers.
So, on the mainline we now generate wrong-code?!
Maybe a wrong-code bug and it is "minor" and P2? Someone please update the status of this report :-)
This still fails on the mainline.
Actually we get: subl $4, %edi subl $12, %esp xorl %eax, %eax cmpl $0, -4(%edi) setle %al addl $12, %esp So this is fixed for the trunk.
Closing 4.1 branch.
Closing 4.2 branch, fixed in 4.3.