Bug 21596 - [4.2 Regression] extra temporaries when using global register variables
Summary: [4.2 Regression] extra temporaries when using global register variables
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 minor
Target Milestone: 4.3.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks:
 
Reported: 2005-05-15 23:44 UTC by Richard Henderson
Modified: 2009-03-30 15:44 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.3.0
Known to fail: 4.0.4 4.2.5
Last reconfirmed: 2007-11-19 09:08:47


Attachments
proposed patch for 4.3 (747 bytes, patch)
2007-03-05 22:43 UTC, Andrew Macleod
Details | Diff
proposed patch for 4.2 (839 bytes, patch)
2007-03-05 22:52 UTC, Andrew Macleod
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Henderson 2005-05-15 23:44:53 UTC
register int *reg __asm__("%edi");
int test () { return *--reg <= 0; }

With -O2 -fomit-frame-pointer, 4.0,

        movl    %edi, %eax
        leal    -4(%edi), %edi
        movl    -4(%eax), %eax
        testl   %eax, %eax

With 3.4,

        subl    $4, %edi
        cmpl    $0, (%edi)

The problem appears to begin at the tree level, with extra temporaries:

  reg.0 = reg;
  reg.2 = reg.0 - 4B;
  reg = reg.2;
  return *reg.2 <= 0;

We do consider hard register variables not is_gimple_reg, due to needing
to V_MAY_DEF them at call sites.  It would be nice if we could eliminate
these temporaries during TER, or something.
Comment 1 Andrew Pinski 2005-05-15 23:57:37 UTC
Confirmed.
Comment 2 Mark Mitchell 2005-10-31 03:36:15 UTC
Leaving this as P2.
Comment 3 Mark Mitchell 2006-02-24 00:25:55 UTC
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Comment 4 Mark Mitchell 2006-05-25 02:32:59 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 5 Andrew Pinski 2006-07-05 09:27:30 UTC
It is slightly different now:
        leal    -4(%edi), %eax
        movl    %eax, %edi
        movl    (%eax), %eax
        testl   %eax, %eax
But still the same issue.
Comment 6 Steven Bosscher 2006-07-22 11:08:23 UTC
In the greg dump we have this RTL:

(insn:HI 10 8 11 2 (parallel [
            (set (reg:SI 58 [ D.1540 ])
                (plus:SI (reg/v:SI 5 di [ reg ])
                    (const_int -4 [0xfffffffffffffffc])))
            (clobber (reg:CC 17 flags))
        ]) 208 {*addsi_1} (nil)
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_DEAD (reg/v:SI 5 di [ reg ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))

(insn:HI 11 10 12 2 (set (reg/v:SI 5 di [ reg ])
        (reg:SI 58 [ D.1540 ])) 40 {*movsi_1} (insn_list:REG_DEP_TRUE 10 (nil))
    (nil))

(insn:HI 12 11 13 2 (set (reg:CCNO 17 flags)
        (compare:CCNO (mem:SI (reg:SI 58 [ D.1540 ]) [3 S4 A32])
            (const_int 0 [0x0]))) 3 {*cmpsi_ccno_1} (nil)
    (expr_list:REG_DEAD (reg:SI 58 [ D.1540 ])
        (nil)))

reg 5 and pseudoreg 58 can share the same hard register (i.e. 58 renumbers to 5) but GCC concludes that the two regs conflict.
Comment 7 Andrew Macleod 2007-03-05 22:43:31 UTC
Created attachment 13149 [details]
proposed patch for 4.3

This patch removes one of the temporary copies.  With this minor tuning of one of TERs heuristics, the tree optimizers produce:
  reg.27 = reg - 4B;
  reg = reg.27;
  return *reg.27 <= 0;

Getting rid of the remaining middle copy is slightly little tricker, because it involves a VDEF.

On mainline, this produces the (I think) desired assembly:
        subl    $4, %edi
        xorl    %eax, %eax
        cmpl    $0, -4(%edi)
        setle   %al
        ret
Comment 8 Andrew Macleod 2007-03-05 22:52:19 UTC
Created attachment 13150 [details]
proposed patch for 4.2

This is the same patch for the 4.2 compiler.  Unfortunately, its not quite good enough because the rtl optimizers still manage to do the wrong thing.  

In mainline, life recognizes that the register is dead in the copy:
(insn 7 5 8 2 (parallel [
            (set (reg:SI 58 [ reg.27 ])                 
               (plus:SI (reg/v:SI 5 di [ reg ])
                    (const_int -4 [0xfffffffc])))
            (clobber (reg:CC 17 flags))
        ]) 148 {*addsi_1} (nil)
    (expr_list:REG_DEAD (reg/v:SI 5 di [ reg ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)             (nil))))

(insn 8 7 9 2 (set (reg/v:SI 5 di [ reg ])
        (reg:SI 58 [ reg.27 ])) 34 {*movsi_1} (insn_list:REG_DEP_TRUE 7 (nil))
    (expr_list:REG_DEAD (reg:SI 58 [ reg.27 ])
        (nil)))

and combine turns it into:

(insn 8 7 9 2 (parallel [
            (set (reg/v:SI 5 di [ reg ])
                (plus:SI (reg/v:SI 5 di [ reg ])
                    (const_int -4 [0xfffffffc])))
            (clobber (reg:CC 17 flags))
        ]) 148 {*addsi_1} (nil)


on the 4.2 branch, we don't seem to get it right and combine, nor anyone else I suppose, manages to get merge the 2 insns.   so we end up aith the same assembly. 

Unless someone sees something in the RTL optimizers that can be tweaked that can figure this out, there isn't much point in applying this to 4.2.

Im not planning to look into the RTL side myself, but I will see if there is anything else TER can do to get rid of this situation in 4.2.
Comment 9 Andrew Macleod 2007-03-06 15:01:58 UTC
actually, mainline isn't working either.  A closer examination shows the code generated has an extra offset of -4 in the compare that shouldn't be there. This patch triggers a bug in rtl's fwprop pass. 
Comment 10 Paolo Bonzini 2007-03-07 08:22:51 UTC
Unfortunately, if I fix the fwprop bug (which is actually caused by wrong df information), I get again

        leal    -4(%edi), %eax
        movl    %eax, %edi
        movl    (%eax), %eax
        testl   %eax, %eax

The df bug is fixed by:

Index: ../../base-gcc-src/gcc/df-scan.c
===================================================================
--- ../../base-gcc-src/gcc/df-scan.c    (revision 122624)
+++ ../../base-gcc-src/gcc/df-scan.c    (working copy)
@@ -1833,6 +1833,13 @@ df_record_entry_block_defs (struct dataf
 #endif
     }
       
+  /* Mark all global registers as being defined at the entry of the
+     function since values set by our caller should not be treated as
+     uninitialized.  */
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (global_regs[i])
+      bitmap_set_bit (df->entry_block_defs, i);
+  
   /* Once the prologue has been generated, all of these registers
      should just show up in the first regular block.  */
   if (HAVE_prologue && epilogue_completed)
Comment 11 Mark Mitchell 2007-03-10 01:39:52 UTC
In this message:

http://gcc.gnu.org/ml/gcc/2007-03/msg00249.html

Andre Macleod indicates that this will be difficult to fix in pre-4.3 compilers.
Comment 12 Richard Biener 2007-08-23 16:14:50 UTC
So, on the mainline we now generate wrong-code?!
Comment 13 Steven Bosscher 2007-11-19 08:56:30 UTC
Maybe a wrong-code bug and it is "minor" and P2?  Someone please update the status of this report :-)
Comment 14 Andrew Pinski 2007-11-19 09:08:47 UTC
This still fails on the mainline.
Comment 15 Andrew Pinski 2007-11-19 09:35:38 UTC
Actually we get:
        subl    $4, %edi
        subl    $12, %esp
        xorl    %eax, %eax
        cmpl    $0, -4(%edi)
        setle   %al
        addl    $12, %esp

So this is fixed for the trunk.
Comment 16 Joseph S. Myers 2008-07-04 16:53:18 UTC
Closing 4.1 branch.
Comment 17 Joseph S. Myers 2009-03-30 15:44:49 UTC
Closing 4.2 branch, fixed in 4.3.