This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Lousy scheduling decisions owing to bad interaction between sched and greg :(



    Morning all!


[  This is a rather long post, so I'll summarize very quickly at the top:
1) I've got bad scheduling decisions being made, 2) I think I know why, 3)
Would anyone be so kind as to cast an eye over my define_function_unit
statement to sanity-check it for me, 4) Does an expander sound like the
right solution to the problem?  ]


===================================================
 1)  I've got bad scheduling decisions being made.
===================================================

  Well, I've run up against something confusing here.  I'm using an
old-style scheduler description, and it's failing very badly to avoid memory
access stalls.  I think I've understood _why_ it's going wrong, but I can't
quite see how it could ever hope to get it right in the situation.  Let me
explain.....

========================
 2) I think I know why.
========================

  The RTL in question looks like this:

------------------------<source.i.22.sched>------------------------
(insn 25 21 29 0 0x101f6b40 (set (mem/s:SI (const:SI (plus:SI
(symbol_ref/v:SI ("TxHeaderArray"))
                    (const_int 4 [0x4]))) [7 TxHeaderArray+4 S4 A32])
        (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])) 22 {movsi}
(insn_list 13 (insn_list 21 (insn_list:REG_DEP_ANTI 16 (nil))))
    (nil))

(insn 29 25 36 0 0x101f6b40 (set (mem/s:SI (symbol_ref/v:SI
("TxHeaderArray")) [7 TxHeaderArray+0 S4 A32])
        (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])) 22 {movsi}
(insn_list 13 (insn_list 21 (insn_list:REG_DEP_ANTI 16
(insn_list:REG_DEP_ANTI 25 (nil)))))
    (nil))
------------------------<source.i.22.sched>------------------------

  Now, I have reg->mem and mem->reg patterns in movsi, but there's no
mem->mem, so it's going to need a register.  I'm using a CPU that can issue
a load every cycle, but the result isn't available until the next cycle + 1;
if you attempt to use the result of a load the very next cycle you stall.
So I want it to generate code that looks like...


   load reg_1 from mem at TxHeaderArray+4
   load reg_2 from mem at TxHeaderArray
   store reg_1 into mem at 0xfa24
   store reg_2 into mem at 0xfa20

but what I'm generally getting is in fact...

   load reg_1 from mem at TxHeaderArray+4
   store reg_1 into mem at 0xfa24
   load reg_1 from mem at TxHeaderArray
   store reg_1 into mem at 0xfa20

and here's the chain of events that I think is causing it:

============================================================================
=============================
 3) Would anyone be so kind as to look over my define_function_unit
statement to sanity-check it for me?
============================================================================
=============================

--------------------------<from .md file>--------------------------
; Memory unit: results not available on cycle after command
; executes but 1 cycle after that.
(define_function_unit "memory" 1 0 (eq_attr "type" "load") 2 1)
--------------------------<from .md file>--------------------------

  IIUIC, this function unit definition accurately reflects the cpu's
behaviour:  one memory load unit, simultaneity of zero meaning you can issue
as many loads as you want without waiting for earlier ones to complete,
ready delay of 2 meaning that if the load is issued on cycle n the data can
be used on cycle n+2, and an issue-delay of 1 meaning a zero-cycle delay
before the next instruction can be issued.  All reasonably sane, I hope.

================================
 2) I think I know why, part 2.
================================

  Now, the sequence of events is I think like this:

------------------------<source.i.22.sched>------------------------
;;   ==================== scheduling visualization  
;;   clock     memory                             settest
no-unit 
;;   =====     ==============================
==============================     ======= 
;;   0         ------------------------------
------------------------------     10      
;;   1         ------------------------------
------------------------------     13      
;;   2         16   r70=[0xfa18]
------------------------------   
;;       .
;;   4         ------------------------------
------------------------------     18      
;;   5         ------------------------------
------------------------------     21      
;;   6         ------------------------------
------------------------------     25      
;;   7         ------------------------------
------------------------------     29      
------------------------<source.i.22.sched>------------------------

  - 1st scheduling pass doesn't recognize these insns as mem loads at all,
because, being m->m sets, they don't match any of the m->r alternatives in
movsi.  It does correctly identify insn 16 which _does_ match the m->r
alternative:

------------------------<source.i.22.sched>------------------------
(insn 16 13 18 0 0x101f6b40 (set (reg/v:SI 70)
        (mem/v:SI (const_int 64024 [0xfa18]) [7 S4 A32])) 22 {movsi}
(insn_list 13 (nil))
    (nil))
------------------------<source.i.22.sched>------------------------

  - Greg comes along, sees that there's no m->m pattern for movsi, and
decides a reload is needed.  Because there's no overlap in the life of the
operands and no aliasing between them, it decides to allocate the same hard
reg for both, then goes ahead and breaks the m->m moves into m->r and r->m
reloaded insns:

-------------------------<source.i.24.greg>------------------------
Reloads for insn # 25
Reload 0: reload_out (SI) = (mem/s:SI (const:SI (plus:SI (symbol_ref/v:SI
("TxHeaderArray"))
                                                            (const_int 4
[0x4]))) [7 TxHeaderArray+4 S4 A32])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (mem/s:SI (const:SI (plus:SI (symbol_ref/v:SI
("TxHeaderArray"))
                                                            (const_int 4
[0x4]))) [7 TxHeaderArray+4 S4 A32])
Reload 1: reload_in (SI) = (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])
	GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
	reload_in_reg: (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])
	reload_reg_rtx: (reg:SI 4 r4)

Reloads for insn # 29
Reload 0: reload_out (SI) = (mem/s:SI (symbol_ref/v:SI ("TxHeaderArray")) [7
TxHeaderArray+0 S4 A32])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (mem/s:SI (symbol_ref/v:SI ("TxHeaderArray")) [7
TxHeaderArray+0 S4 A32])
Reload 1: reload_in (SI) = (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])
	GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
	reload_in_reg: (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])
	reload_reg_rtx: (reg:SI 4 r4)
;; Register dispositions:
69 in 4  70 in 3  

;; Hard regs used:  3 4

......

(insn 38 21 25 0 0x0 (set (reg:SI 4 r4)
        (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])) 22 {movsi} (nil)
    (nil))

(insn 25 38 39 0 0x101f6b40 (set (mem/s:SI (const:SI (plus:SI
(symbol_ref/v:SI ("TxHeaderArray"))
                    (const_int 4 [0x4]))) [7 TxHeaderArray+4 S4 A32])
        (reg:SI 4 r4)) 22 {movsi} (insn_list 13 (insn_list 21
(insn_list:REG_DEP_ANTI 16 (nil))))
    (nil))

(insn 39 25 29 0 0x0 (set (reg:SI 4 r4)
        (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])) 22 {movsi} (nil)
    (nil))

(insn 29 39 36 0 0x101f6b40 (set (mem/s:SI (symbol_ref/v:SI
("TxHeaderArray")) [7 TxHeaderArray+0 S4 A32])
        (reg:SI 4 r4)) 22 {movsi} (insn_list 13 (insn_list 21
(insn_list:REG_DEP_ANTI 16 (insn_list:REG_DEP_ANTI 25 (nil)))))
    (nil))
-------------------------<source.i.24.greg>------------------------

  - 2nd scheduling pass this time *does* recognize these insns as mem loads,
but there's nothing it can do to schedule them around each other because
they're now sharing the same hard reg and it would get trashed (as reflected
by the REG_DEP_ANTI 25 note on insn 29, I take it).

------------------------<source.i.30.sched2>-----------------------
;;   ==================== scheduling visualization  
;;   clock     memory                             settest
no-unit 
;;   =====     ==============================
==============================     ======= 
;;   0         ------------------------------
------------------------------     10      
;;   1         ------------------------------
------------------------------     13      
;;   2         16   r3=[0xfa18]
------------------------------   
;;       .
;;   4         ------------------------------
------------------------------     18      
;;   5         ------------------------------
------------------------------     21      
;;   6         38   r4=[0xfa24]
------------------------------   
;;       .
;;   8         ------------------------------
------------------------------     25      
;;   9         39   r4=[0xfa20]
------------------------------   
;;       .
;;   11        ------------------------------
------------------------------     29      
------------------------<source.i.30.sched2>-----------------------

  Well, I can't quite see how to straighten this mess out.  Sched1 can't
deduce that there are going to be memory loads here because reload hasn't
happened yet and the patterns don't match a load alternative.  Greg can't
know that that these insns need scheduling apart from each other so he
chooses to save registers rather than extend data lifespan.  Sched2 can't do
anything about Greg's choice of registers so it's stuck with being obliged
to schedule the insns with a stall.

===================================================================
 4) Does an expander sound like the right solution to the problem? 
===================================================================

  So, do I have to write a movsi mem->mem expander to fix this?  I think it
might work if I break the mem->mem into a mem->reg and reg->mem at an
earlier stage so that the mem load operations can be recognized by sched1.
Does this sound likely to anybody?

    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]