This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Incorrect DFA scheduling of output dependency.
Steven, Nathan, et al.
You're not making this easy because you haven't told anything about:
1) what target you're working on (apparently something that is not
in the FSF GCC tree);
2) what your DFA description looks like (did you tell the scheduler
that those two instructions are issued in parallel?); and
3) what version of GCC you are working with.
I'm working on a 16-bit DSP port of gcc, which hasn't been contributed
back to the mainline tree yet. The port is based on 3.4.3
The DFA scheduler describes a machine which has 3 execution slots, plus
an additional slot for a long immediate value. I've attached my DFA
description below. The DFA scheduler normally ensures that instructions
with data dependencies are placed in different cycles. Once the
scheduler has completed, the first instruction for each cycle is marked
with a TI mode instruction. I have specialised versions of
asm_output_opcode and final_prescan_insn which detect the TI mode
labels, and arrange for the assembly output to include the VLIW packing
information.
I have to use the machine dependent reorganisation phase to run the
scheduler, so that the last-jump-optimisation doesn't disturb the TI
mode labels applied to the first instruction in each clock cycle (as per
the IA64).
But first look at the scheduler dumps (-dS and -dR) to see if the
output dependency is there, of course...
I was wrong here. The instruction sequence is actually a data
(read-after-write) dependency, not an output dependency
(write-after-write). However, the relevent portion of the scheduler dump
is as follows:
(note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
(const_int 0 [0x0])) 15 {movhi} (nil)
(nil))
(note 150 64 133 2 NOTE_INSN_LOOP_END)
(insn 133 150 135 2 (set (reg:HI 5 R5 [33])
(ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
(const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI
64 (nil))
(expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
(const_int 2 [0x2]))
(nil)))
Does this state that insn 133 is anti-dependent on insn 64? An
anti-dependency is a write following a read, but in this sequence a read
follows a write. The anti-dependency first appears after the basic block
reordering pass has been run (which is immediately before the
instruction scheduling pass).
If I modify TARGET_SCHED_ADJUST_COST to return 1 when an anti-dependency
is encountered, this results in the two instructions being scheduled in
different cycles (and hence, different VLIW packets). For a VLIW machine
however, it is legal for anti-dependent instructions to be scheduled in
the same cycle, so I can't use this method to permanently fix the problem.
many thanks,
dan.
;;==============================================================================
;; Scheduling, including delay slot scheduling.
;;==============================================================================
(automata_option "v")
(automata_option "ndfa")
;; Define each VLIW slot as a CPU resource.
(define_attr "type"
"picoAlu,basicAlu,nonCcAlu,mem,branch,mul,mac,app,comms,unknown"
(const_string "unknown"))
;; Define whether an instruction uses a long constant.
(define_attr "longConstant"
"true,false" (const_string "false"))
;; Define three EU slots.
(define_query_cpu_unit "slot0,slot1,slot2")
;; Each instruction comes in forms with and without long
;; constants. The long constant is treated as though it were also an
;; instruction. Thus, an instruction which used slot0, will use slot0
;; plus one of the other slots for the constant. This mechanism
;; ensures that it is impossible for 3 instructions to be issued, if
;; one of them has a long constant.
; Extended ALU - Slot 0
(define_insn_reservation "picoAluInsn" 1
(and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "false"))
"slot0")
(define_insn_reservation "picoAluInsnWithConst" 1
(and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "true"))
"(slot0+slot1)|(slot0+slot2)")
; Basic ALU - Slot 0 or 1
(define_insn_reservation "basicAluInsn" 1
(and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "false"))
"(slot0|slot1)")
(define_insn_reservation "basicAluInsnWithConst" 1
(and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "true"))
"(slot0+slot1) | (slot1+slot2) | (slot0+slot2)")
; ALU which must not set flags - Slot 1
(define_insn_reservation "nonCcAluInsn" 1
(and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "false"))
"slot1")
(define_insn_reservation "nonCcAluInsnWithConst" 1
(and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "true"))
"(slot1+slot0) | (slot1+slot2)")
; Memory - Slot 1
(define_insn_reservation "memInsn" 2
(and (eq_attr "type" "mem") (eq_attr "longConstant" "false"))
"slot1,nothing")
(define_insn_reservation "memInsnWithConst" 2
(and (eq_attr "type" "mem") (eq_attr "longConstant" "true"))
"slot1+(slot0|slot2),nothing")
; Multiply - Slot 2
(define_insn_reservation "mulInsn" 1
(and (eq_attr "type" "mul") (eq_attr "longConstant" "false"))
"slot2")
(define_insn_reservation "mulInsnWithConst" 1
(and (eq_attr "type" "mul") (eq_attr "longConstant" "true"))
"(slot2+slot0)|(slot2+slot1)")
; Branch - Slot 2
(define_insn_reservation "branchInsn" 1
(and (eq_attr "type" "branch") (eq_attr "longConstant" "false"))
"slot2")
(define_insn_reservation "branchInsnWithConst" 1
(and (eq_attr "type" "branch") (eq_attr "longConstant" "true"))
"(slot2+slot0)|(slot2+slot1)")
; Communications - Slot 1
(define_insn_reservation "commsInsn" 1
(eq_attr "type" "comms")
"slot1")
; Unknown instructions are assumed to take a single cycle, and use all
; slots. This enables them to actually output a sequence of
; instructions without any limitation.
(define_insn_reservation "unknownInsn" 1
(eq_attr "type" "unknown")
"(slot0+slot1+slot2)")