This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/53107] New: scheduling fail


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53107

             Bug #: 53107
           Summary: scheduling fail
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: mrs@gcc.gnu.org


When generating code for testsuite/gcc.c-torture/execute/ieee/pr50310.c, I
noticed that all the stores are pushed to the end where they can't execute
simultaneously with other instructions.  I have tons of free execution slots
around the stores, as the stores have to contend with a relatively narrow
off-chip data path to memory.  The code looks something like:

;;   --------------- forward dependences: ------------ 

;;   --- Region Dependences --- b 2 bb 0 
;;      insn  code    bb   dep  prio  cost   reservation
;;      ----  ----    --   ---  ----  ----   -----------
;;       18   392     2     2     7     1   cmpcc       : 35 19 
;;       19  1633     2     2     6     1   movcc       : 20 
;;       20    80     2     2     5     5   stm_4       : 
;;       35  1633     2     2     6     1   movcc       : 36 
;;       36    80     2     2     5     5   stm_4       : 
;;       50   388     2     2     7     1   cmpcc       : 67 51 
;;       51  1633     2     2     6     1   movcc       : 52 
;;       52    80     2     2     5     5   stm_4       : 
;;       67  1633     2     2     6     1   movcc       : 68 
;;       68    80     2     2     5     5   stm_4       : 
;;       82   389     2     2     7     1   cmpcc       : 99 83 
;;       83  1633     2     2     6     1   movcc       : 84 
;;       84    80     2     2     5     5   stm_4       : 
;;       99  1633     2     2     6     1   movcc       : 100 
;;      100    80     2     2     5     5   stm_4       : 
[ repeated 10 more times]

with a sequence of 16 of the 3 instruction block as this is an -O3 compile. 
Most of the costs associated with cmpcc and movcc would be free, if they were
moved near the stm instructions.  The scheduling algorithm sorts and issues the
insns based upon prio, so, all the 7s (cmpcc) go first, then all the 6s go next
(movcc), and all the stores (stm_4) last.  This hurts, and the original
ordering would have produced faster code.  :-(  I don't know the best way to
fix this, as this is just a machine independent part of the algorithm that
dates back to the original, this is how you schedule paper.  It is incomplete
and is now overly simplistic for the types of cpus some people build.  The best
fix is one that refines the costs in some way.  For example, cmpcc, movecc,
stm_4 with the stm_4 staggered 1 group down, when run through the dfa, would
come to the conclusion that the cmpcc and movecc instructions are free. 
Presently the priority field is a simple addition of the individual costs of
the insns, not taking into consideration that the dfa knows that simple
addition is a poor substitute.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]