This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[Bug target/36539] IRA doesn't account for earlyclobber asm conflicts

From: "law at redhat dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 29 Jan 2010 11:01:15 -0000
Subject: [Bug target/36539] IRA doesn't account for earlyclobber asm conflicts
References: <bug-36539-10175@http.gcc.gnu.org/bugzilla/>
Reply-to: gcc-bugzilla at gcc dot gnu dot org

------- Comment #9 from law at redhat dot com  2010-01-29 11:01 -------
At some point IRA's handling of earlyclobber constraints improved enough to get
the conflicts right in this PR.  Unfortunately, that is not sufficient to
generate good code.  

If we compile the testcase on i686-pc-linux-gnu with -O2 -fomit-frame-pointer,
we should be able to allocate hard regs for all the allocnos *and* do so
without generating any reloads.  Unfortunately, IRA makes poor register
selections which ultimately lead to reloading.

Pass 0 for finding pseudo/allocno costs

    a0 (r68,l0) best AREG, cover GENERAL_REGS
    a4 (r67,l0) best Q_REGS, cover GENERAL_REGS
    a3 (r66,l0) best GENERAL_REGS, cover GENERAL_REGS
    a1 (r65,l0) best GENERAL_REGS, cover GENERAL_REGS
    a5 (r64,l0) best GENERAL_REGS, cover GENERAL_REGS
    a2 (r63,l0) best GENERAL_REGS, cover GENERAL_REGS
    a7 (r59,l0) best GENERAL_REGS, cover GENERAL_REGS
    a6 (r58,l0) best GENERAL_REGS, cover GENERAL_REGS

  a0(r68,l0) costs: AREG:-1000,-1000 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0
DIREG:0,0 AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0
GENERAL_REGS:0,0 MEM:6000
  a1(r65,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:7000
  a2(r63,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:28000
  a3(r66,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:16000
  a4(r67,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:4000,4000
DIREG:4000,4000 AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:4000,4000
GENERAL_REGS:4000,4000 MEM:16000
  a5(r64,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:4000
  a6(r58,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:3000
  a7(r59,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:3000


Pass 1 for finding pseudo/allocno costs

    r68: preferred AREG, alternative GENERAL_REGS, cover GENERAL_REGS
    r67: preferred Q_REGS, alternative GENERAL_REGS, cover GENERAL_REGS
    r66: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS
    r65: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS
    r64: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS
    r63: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS
    r59: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS
    r58: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS

  a0(r68,l0) costs: AREG:-1000,-1000 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0
DIREG:0,0 AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0
GENERAL_REGS:0,0 MEM:6000
  a1(r65,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:7000
  a2(r63,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:28000
  a3(r66,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:16000
  a4(r67,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:4000,4000
DIREG:4000,4000 AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:4000,4000
GENERAL_REGS:4000,4000 MEM:16000
  a5(r64,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:4000
  a6(r58,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:3000
  a7(r59,l0) costs: AREG:0,0 DREG:0,0 CREG:0,0 BREG:0,0 SIREG:0,0 DIREG:0,0
AD_REGS:0,0 CLOBBERED_REGS:0,0 Q_REGS:0,0 NON_Q_REGS:0,0 GENERAL_REGS:0,0
MEM:3000

That looks fairly reasonable.  pseudo 68 is the return value, so assigning it
into ax is a win.  r67 wants Q_REGS so assigning it to SI, DI, or NON_Q has a
cost.


;; a0(r68,l0) conflicts:
;;     total conflict hard regs:
;;     conflict hard regs:
;; a1(r65,l0) conflicts: a3(r66,l0) a2(r63,l0) a4(r67,l0) a5(r64,l0)
;;     total conflict hard regs: 1 2
;;     conflict hard regs: 1 2
;; a2(r63,l0) conflicts: a1(r65,l0) a3(r66,l0) a4(r67,l0) a5(r64,l0) a6(r58,l0)
a7(r59,l0)
;;     total conflict hard regs: 1 2
;;     conflict hard regs: 1 2
;; a3(r66,l0) conflicts: a1(r65,l0) a2(r63,l0) a4(r67,l0) a5(r64,l0) a6(r58,l0)
;;     total conflict hard regs: 1 2
;;     conflict hard regs: 1 2
;; a4(r67,l0) conflicts: a1(r65,l0) a3(r66,l0) a2(r63,l0) a5(r64,l0)
;;     total conflict hard regs: 1 2
;;     conflict hard regs: 1 2
;; a5(r64,l0) conflicts: a1(r65,l0) a3(r66,l0) a2(r63,l0) a4(r67,l0)
;;     total conflict hard regs: 1 2
;;     conflict hard regs: 1 2
;; a6(r58,l0) conflicts: a3(r66,l0) a2(r63,l0) a7(r59,l0)
;;     total conflict hard regs:
;;     conflict hard regs:
;; a7(r59,l0) conflicts: a2(r63,l0) a6(r58,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

  cp0:a3(r66)<->a7(r59)@1000:move
  cp1:a4(r67)<->a6(r58)@1000:move
  cp2:a0(r68)<->a1(r65)@125:shuffle


The r65/r68 copy will cause IRA to want to use ax for r65 and as a result will
increase the cost of ax for every pseudo which conflicts with r65 (r63, r64,
r66, r67).

So far, so good.  The problem is I don't see where we increase the cost of
Q_REGS for pseudos which conflict with r67.  So when we color:



      Pushing a7(r59,l0)
      Pushing a6(r58,l0)
      Pushing a0(r68,l0)
      Pushing a3(r66,l0)(potential spill: pri=2285, cost=16000)
      Pushing a4(r67,l0)(potential spill: pri=2666, cost=16000)
      Pushing a1(r65,l0)
      Pushing a5(r64,l0)
      Pushing a2(r63,l0)
      Popping a2(r63,l0)  -- assign reg 3
      Popping a5(r64,l0)  -- assign reg 4
      Popping a1(r65,l0)  -- assign reg 0
      Popping a4(r67,l0)  -- assign reg 5
      Popping a3(r66,l0)  -- assign reg 6
      Popping a0(r68,l0)  -- assign reg 0
      Popping a6(r58,l0)  -- assign reg 5
      Popping a7(r59,l0)  -- assign reg 6

We've assigned r63 into bx and thus r67 can't be assigned into bx.  We
ultimately assign r67 into di.  That in turn causes r65 to get spilled during
reload so that r67 can be reloaded into a  Q_REGs and ultimately we generate
crappy code.

It seems to me we ought to have code which increases the cost for Q_REGS for
pseudos conflicting with r67, but I can't seem to find it.  FWIW, if I manually
increase the cost of Q_REGS for pseudos r63, r64, r65, r66) I get the following
coloring:

      Pushing a7(r59,l0)
      Pushing a6(r58,l0)
      Pushing a0(r68,l0)
      Pushing a3(r66,l0)(potential spill: pri=2285, cost=16000)
      Pushing a4(r67,l0)(potential spill: pri=2666, cost=16000)
      Pushing a1(r65,l0)
      Pushing a5(r64,l0)
      Pushing a2(r63,l0)
      Popping a2(r63,l0)  -- assign reg 4
      Popping a5(r64,l0)  -- assign reg 5
      Popping a1(r65,l0)  -- assign reg 0
      Popping a4(r67,l0)  -- assign reg 3
      Popping a3(r66,l0)  -- assign reg 6
      Popping a0(r68,l0)  -- assign reg 0
      Popping a6(r58,l0)  -- assign reg 3
      Popping a7(r59,l0)  -- assign reg 6

Which is a perfect allocation requiring no reloads.  


-- 

law at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-01-29 11:01:15
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]