This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/53107] New: scheduling fail
- From: "mrs at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 25 Apr 2012 01:19:27 +0000
- Subject: [Bug rtl-optimization/53107] New: scheduling fail
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53107
Bug #: 53107
Summary: scheduling fail
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: mrs@gcc.gnu.org
When generating code for testsuite/gcc.c-torture/execute/ieee/pr50310.c, I
noticed that all the stores are pushed to the end where they can't execute
simultaneously with other instructions. I have tons of free execution slots
around the stores, as the stores have to contend with a relatively narrow
off-chip data path to memory. The code looks something like:
;; --------------- forward dependences: ------------
;; --- Region Dependences --- b 2 bb 0
;; insn code bb dep prio cost reservation
;; ---- ---- -- --- ---- ---- -----------
;; 18 392 2 2 7 1 cmpcc : 35 19
;; 19 1633 2 2 6 1 movcc : 20
;; 20 80 2 2 5 5 stm_4 :
;; 35 1633 2 2 6 1 movcc : 36
;; 36 80 2 2 5 5 stm_4 :
;; 50 388 2 2 7 1 cmpcc : 67 51
;; 51 1633 2 2 6 1 movcc : 52
;; 52 80 2 2 5 5 stm_4 :
;; 67 1633 2 2 6 1 movcc : 68
;; 68 80 2 2 5 5 stm_4 :
;; 82 389 2 2 7 1 cmpcc : 99 83
;; 83 1633 2 2 6 1 movcc : 84
;; 84 80 2 2 5 5 stm_4 :
;; 99 1633 2 2 6 1 movcc : 100
;; 100 80 2 2 5 5 stm_4 :
[ repeated 10 more times]
with a sequence of 16 of the 3 instruction block as this is an -O3 compile.
Most of the costs associated with cmpcc and movcc would be free, if they were
moved near the stm instructions. The scheduling algorithm sorts and issues the
insns based upon prio, so, all the 7s (cmpcc) go first, then all the 6s go next
(movcc), and all the stores (stm_4) last. This hurts, and the original
ordering would have produced faster code. :-( I don't know the best way to
fix this, as this is just a machine independent part of the algorithm that
dates back to the original, this is how you schedule paper. It is incomplete
and is now overly simplistic for the types of cpus some people build. The best
fix is one that refines the costs in some way. For example, cmpcc, movecc,
stm_4 with the stm_4 staggered 1 group down, when run through the dfa, would
come to the conclusion that the cmpcc and movecc instructions are free.
Presently the priority field is a simple addition of the individual costs of
the insns, not taking into consideration that the dfa knows that simple
addition is a poor substitute.