This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Delay scheduling due to possible future multiple issue in VLIW

From: Maxim Kuvyrkov <maxim at kugelworks dot com>
To: Paulo Matos <pmatos at broadcom dot com>
Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Tue, 16 Jul 2013 16:02:25 +1200
Subject: Re: Delay scheduling due to possible future multiple issue in VLIW
References: <19EB96622A777C4AB91610E763265F4621FDDF at SJEXCHMB14 dot corp dot ad dot broadcom dot com> <19EB96622A777C4AB91610E763265F4622761A at SJEXCHMB14 dot corp dot ad dot broadcom dot com>

Paulo,

GCC schedule is not particularly designed for VLIW architectures, but it handles them reasonably well.  For the example of your code both schedules take same time to execute:

38: 0: r1 = e[r0]
40: 4: [r0] = r1
41: 5: r0 = r0+4
43: 5: p0 = r1!=0
44: 6: jump p0

and

38: 0: r1 = e[r0]
41: 1: r0 = r0+4
40: 4: [r0] = r1
43: 5: p0 = r1!=0
44: 6: jump p0

[It is true that the first schedule takes less space due to fortunate VLIW packing.]

You are correct that GCC scheduler is greedy and that it tries to issue instructions as soon as possible (i.e., it is better to issue something on the cycle, than nothing at all), which is a sensible strategy.  For small basic block the greedy algorithm may cause artifacts like the one you describe.

You could try increasing size of regions on which scheduler operates by switching your port to use scheb-ebb scheduler, which was originally developed for ia64.

Regards,

--
Maxim Kuvyrkov
KugelWorks



On 27/06/2013, at 8:35 PM, Paulo Matos wrote:

> Let me add to my own post saying that it seems that the problem is that the list scheduler is greedy in the sense that it will take an instruction from the ready list no matter what when waiting and trying to pair it with later on with another instruction might be more beneficial. In a sense it seems that the idea is that 'issuing instructions as soon as possible is better' which might be true for a single issue chip but a VLIW with multiple issue has to contend with other problems.
> 
> Any thoughts on this?
> 
> Paulo Matos
> 
> 
>> -----Original Message-----
>> From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of Paulo
>> Matos
>> Sent: 26 June 2013 15:08
>> To: gcc@gcc.gnu.org
>> Subject: Delay scheduling due to possible future multiple issue in VLIW
>> 
>> Hello,
>> 
>> We have a port for a VLIW machine using gcc head 4.8 with an maximum issue of
>> 2 per clock cycle (sometimes only 1 due to machine constraints).
>> We are seeing the following situation in sched2:
>> 
>> ;;   --------------- forward dependences: ------------
>> 
>> ;;   --- Region Dependences --- b 3 bb 0
>> ;;      insn  code    bb   dep  prio  cost   reservation
>> ;;      ----  ----    --   ---  ----  ----   -----------
>> ;;       38  1395     3     0     6     4
>> (p0+long_imm+ldst0+lock0),nothing*3 : 44m 43 41 40
>> ;;       40   491     3     1     2     2   (p0+long_imm+ldst0+lock0),nothing
>> : 44m 41
>> ;;       41   536     3     2     1     1   (p0+no_stl2)|(p1+no_dual)   : 44
>> ;;       43  1340     3     1     2     1   (p0+no_stl2)|(p1+no_dual)   : 44m
>> ;;       44  1440     3     4     1     1   (p0+long_imm)       :
>> 
>> ;;              dependencies resolved: insn 38
>> ;;              tick updated: insn 38 into ready
>> ;;              dependencies resolved: insn 41
>> ;;              tick updated: insn 41 into ready
>> ;;      Advanced a state.
>> ;;              Ready list after queue_to_ready:    41:4  38:2
>> ;;              Ready list after ready_sort:    41:4  38:2
>> ;;      Ready list (t =   0):    41:4  38:2
>> ;;              Chosen insn : 38
>> ;;        0--> b  0: i  38r1=zxn([r0+`b'])
>> :(p0+long_imm+ldst0+lock0),nothing*3
>> ;;              dependencies resolved: insn 43
>> ;;              Ready-->Q: insn 43: queued for 4 cycles (change queue index).
>> ;;              tick updated: insn 43 into queue with cost=4
>> ;;              dependencies resolved: insn 40
>> ;;              Ready-->Q: insn 40: queued for 4 cycles (change queue index).
>> ;;              tick updated: insn 40 into queue with cost=4
>> ;;              Ready-->Q: insn 41: queued for 1 cycles (resource conflict).
>> ;;      Ready list (t =   0):
>> ;;      Advanced a state.
>> ;;              Q-->Ready: insn 41: moving to ready without stalls
>> ;;              Ready list after queue_to_ready:    41:4
>> ;;              Ready list after ready_sort:    41:4
>> ;;      Ready list (t =   1):    41:4
>> ;;              Chosen insn : 41
>> ;;        1--> b  0: i  41r0=r0+0x4
>> :(p0+no_stl2)|(p1+no_dual)
>> 
>> So, it is scheduling first insn 38 followed by 41.
>> The insn chain for bb3 before sched2 looks like:
>> (insn 38 36 40 3 (set (reg:DI 1 r1)
>>        (zero_extend:DI (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ]
>> [119])
>>                    (symbol_ref:SI ("b") [flags 0x80]  <var_decl
>> 0x2b9c011f75a0 b>)) [2 MEM[symbol: b, index: ivtmp.13_7, offset: 0B]+0 S4
>> A32]))) pr3115b.c:13 1395 {zero_extendsidi2}
>>     (nil))
>> (insn 40 38 41 3 (set (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ]
>> [119])
>>                (symbol_ref:SI ("a") [flags 0x80]  <var_decl 0x2b9c011f7500
>> a>)) [2 MEM[symbol: a, index: ivtmp.13_7, offset: 0B]+0 S4 A32])
>>        (reg:SI 1 r1 [orig:118 D.3048 ] [118])) pr3115b.c:13 491 {fp_movsi}
>>     (nil))
>> (insn 41 40 43 3 (set (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119])
>>        (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119])
>>            (const_int 4 [0x4]))) 536 {addsi3}
>>     (nil))
>> (insn 43 41 44 3 (set (reg:BI 64 p0 [122])
>>        (ne:BI (reg:SI 1 r1 [orig:118 D.3048 ] [118])
>>            (const_int 0 [0]))) pr3115b.c:13 1340 {cmp_simode}
>>     (expr_list:REG_DEAD (reg:SI 1 r1 [orig:118 D.3048 ] [118])
>>        (nil)))
>> (jump_insn 44 43 55 3 (set (pc)
>>        (if_then_else (ne (reg:BI 64 p0 [122])
>>                (const_int 0 [0]))
>>            (label_ref:SI 35)
>>            (pc))) pr3115b.c:13 1440 {cbranchbi4}
>>     (expr_list:REG_DEAD (reg:BI 64 p0 [122])
>>        (expr_list:REG_BR_PROB (const_int 9844 [0x2674])
>>            (expr_list:REG_PRED_WIDTH (const_int 4 [0x4])
>>                (nil))))
>> 
>> 
>> The problem with this is that GCC is scheduling insn 38, followed by 41, (a
>> patched) 40, 43 and 44.
>> However, if it had delayed scheduling 41, waited a clock cycle, issued 40
>> then it would be able to issue 38 paired with 43 in the same clock cycle and
>> then 44.
>> So, instead of generating the following insn chain:
>> 38
>> 41
>> patched 40
>> 43
>> 44
>> 
>> it would generate
>> 38
>> 40
>> 41 : 43
>> 44
>> 
>> Is there a way to instruct the scheduler to wait on an instruction on a given
>> clock cycle (even if that instruction is the only one on the ready list)
>> because it's possible that it can be paired with a later instruction in the
>> chain if issued simultaneously?
>> 
>> Cheers,
>> 
>> Paulo Matos
>> 
>

Follow-Ups:
- RE: Delay scheduling due to possible future multiple issue in VLIW
  - From: Paulo Matos

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]