This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Incorrect DFA scheduling of output dependency.


Steven, Nathan, et al.

You're not making this easy because you haven't told anything about:
1) what target you're working on (apparently something that is not
in the FSF GCC tree);
2) what your DFA description looks like (did you tell the scheduler
that those two instructions are issued in parallel?); and
3) what version of GCC you are working with.


I'm working on a 16-bit DSP port of gcc, which hasn't been contributed back to the mainline tree yet. The port is based on 3.4.3

The DFA scheduler describes a machine which has 3 execution slots, plus an additional slot for a long immediate value. I've attached my DFA description below. The DFA scheduler normally ensures that instructions with data dependencies are placed in different cycles. Once the scheduler has completed, the first instruction for each cycle is marked with a TI mode instruction. I have specialised versions of asm_output_opcode and final_prescan_insn which detect the TI mode labels, and arrange for the assembly output to include the VLIW packing information.

I have to use the machine dependent reorganisation phase to run the scheduler, so that the last-jump-optimisation doesn't disturb the TI mode labels applied to the first instruction in each clock cycle (as per the IA64).

But first look at the scheduler dumps (-dS and -dR) to see if the
output dependency is there, of course...

I was wrong here. The instruction sequence is actually a data (read-after-write) dependency, not an output dependency (write-after-write). However, the relevent portion of the scheduler dump is as follows:

(note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
       (const_int 0 [0x0])) 15 {movhi} (nil)
   (nil))

(note 150 64 133 2 NOTE_INSN_LOOP_END)

(insn 133 150 135 2 (set (reg:HI 5 R5 [33])
(ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
(const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI 64 (nil))
(expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
(const_int 2 [0x2]))
(nil)))


Does this state that insn 133 is anti-dependent on insn 64? An anti-dependency is a write following a read, but in this sequence a read follows a write. The anti-dependency first appears after the basic block reordering pass has been run (which is immediately before the instruction scheduling pass).

If I modify TARGET_SCHED_ADJUST_COST to return 1 when an anti-dependency is encountered, this results in the two instructions being scheduled in different cycles (and hence, different VLIW packets). For a VLIW machine however, it is legal for anti-dependent instructions to be scheduled in the same cycle, so I can't use this method to permanently fix the problem.

many thanks,

dan.

;;==============================================================================
;; Scheduling, including delay slot scheduling.
;;==============================================================================

(automata_option "v")
(automata_option "ndfa")

;; Define each VLIW slot as a CPU resource.

(define_attr "type"
 "picoAlu,basicAlu,nonCcAlu,mem,branch,mul,mac,app,comms,unknown"
 (const_string "unknown"))

;; Define whether an instruction uses a long constant.

(define_attr "longConstant"
 "true,false" (const_string "false"))

;; Define three EU slots.
(define_query_cpu_unit "slot0,slot1,slot2")

;; Each instruction comes in forms with and without long
;; constants. The long constant is treated as though it were also an
;; instruction. Thus, an instruction which used slot0, will use slot0
;; plus one of the other slots for the constant. This mechanism
;; ensures that it is impossible for 3 instructions to be issued, if
;; one of them has a long constant.

; Extended ALU - Slot 0
(define_insn_reservation "picoAluInsn" 1
 (and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "false"))
 "slot0")
(define_insn_reservation "picoAluInsnWithConst" 1
 (and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "true"))
 "(slot0+slot1)|(slot0+slot2)")

; Basic ALU - Slot 0 or 1
(define_insn_reservation "basicAluInsn" 1
 (and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "false"))
 "(slot0|slot1)")
(define_insn_reservation "basicAluInsnWithConst" 1
 (and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "true"))
 "(slot0+slot1) | (slot1+slot2) | (slot0+slot2)")

; ALU which must not set flags - Slot 1
(define_insn_reservation "nonCcAluInsn" 1
 (and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "false"))
 "slot1")
(define_insn_reservation "nonCcAluInsnWithConst" 1
 (and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "true"))
 "(slot1+slot0) | (slot1+slot2)")

; Memory - Slot 1
(define_insn_reservation "memInsn" 2
 (and (eq_attr "type" "mem") (eq_attr "longConstant" "false"))
 "slot1,nothing")
(define_insn_reservation "memInsnWithConst" 2
 (and (eq_attr "type" "mem") (eq_attr "longConstant" "true"))
 "slot1+(slot0|slot2),nothing")

; Multiply - Slot 2
(define_insn_reservation "mulInsn" 1
 (and (eq_attr "type" "mul") (eq_attr "longConstant" "false"))
 "slot2")
(define_insn_reservation "mulInsnWithConst" 1
 (and (eq_attr "type" "mul") (eq_attr "longConstant" "true"))
 "(slot2+slot0)|(slot2+slot1)")

; Branch - Slot 2
(define_insn_reservation "branchInsn" 1
 (and (eq_attr "type" "branch") (eq_attr "longConstant" "false"))
 "slot2")
(define_insn_reservation "branchInsnWithConst" 1
 (and (eq_attr "type" "branch") (eq_attr "longConstant" "true"))
 "(slot2+slot0)|(slot2+slot1)")

; Communications - Slot 1
(define_insn_reservation "commsInsn" 1
 (eq_attr "type" "comms")
 "slot1")

; Unknown instructions are assumed to take a single cycle, and use all
; slots. This enables them to actually output a sequence of
; instructions without any limitation.

(define_insn_reservation "unknownInsn" 1
 (eq_attr "type" "unknown")
 "(slot0+slot1+slot2)")



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]