This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

More ARM scheduling weirdness



I was sitting here looking at more schedules for the ARM and found that
many instructions were being classified as "no unit".

This turns out to be rather sub-optimal when there are insns in the stream
with an issue delay of more than one cycle for the core unit.

Consider the following from arm.md

(define_function_unit "core" 1 0 (eq_attr "type" "store2") 3 3)

(define_function_unit "core" 1 0 (eq_attr "type" "store3") 4 4)

(define_function_unit "core" 1 0 (eq_attr "type" "store4") 5 5)


If we issue an instruction which matches any of those attributes we consider
the "core" unit 100% blocked for 3, 4 or 5 cycles and will not schedule any
instructions to the core unit.

However, all those pesky ALU instructions are not handled by any of the 
define_function_units for "core".  So they are classified as "no unit".

Since "core" and "no unit" are different, the scheduler thinks it is profitable
to go ahead and issue ALU instructions to "no unit" while waiting on issue
resources for the "core" unit.  And (of course) they stall since they actually
use the "core" unit.

You might ask what difference does it make since any insn we try to issue
while "core" is busy will stall anyway.  If the scheduler didn't issue those
ALU instructions into "no unit" they could be used elsewhere to fill pipeline
bubbles.

Maybe an example would help.

;;   ======================================================
;;   -- basic block 0 from 49 to 15 -- after reload
;;   ======================================================


;;      Ready list (t =  0):    4  11  75

;;      Ready list (t =  1):    4  11

;;      Ready list (t =  2):    37

;;      Ready list (t =  5):    11

;;      Ready list (t =  7):    13

;;      Ready list (t =  9):    41  14

;;      Ready list (t = 10):    41

;;      Ready list (t = 11):    15
;;      Ready list (final):


;;   ==================== scheduling visualization for block 0
;;   clock     core                               no-unit
;;   =====     ==============================     =======
;;   0         75   {[--sp]=unspec[lr] 2;}
;;   1         75   {[--sp]=unspec[lr] 2;}        4
;;   2         75   {[--sp]=unspec[lr] 2;}        37
;;       ..
;;   5         11   r3=`Ptr_Glob'
;;       ..
;;   7         13   r3=[r3]
;;       ..
;;   9         ------------------------------     14
;;   10        ------------------------------     41
;;   11        ------------------------------     15

[ Yes, this is our buddy dhrystone... ]

Note how we scheduled insns 4 & 37 into no unit while the core was blocked.
They're going to stall, even though the scheduler thinks they executed and
completed.  So insn 11 actually fires at clock 7, insn 13 at clock 9, etc
for a total time of 13 cycles.

But more importantly insns 4 & 37 could have been scheduled at clocks 6 and
8 respectively which would have caused the schedule to look like:

;;   0         75   {[--sp]=unspec[lr] 2;}
;;   1         75   {[--sp]=unspec[lr] 2;}
;;   2         75   {[--sp]=unspec[lr] 2;}
;;       ..
;;   5         11   r3=`Ptr_Glob'
;;   6         4
;;   7         13   r3=[r3]
;;   8         37
;;   9         14
;;   10        41
;;   11        15


Which should complete in 11 cycles instead of 13.

And that's precisely what I get if I add something like this to the md file:

(define_function_unit "core" 1 0
  (and (eq_attr "ldsched" "yes") (eq_attr "type" "!store2,store3,store4")) 1 1)

Thoughts?
jeff



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]