This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 12/01/2015 02:11 PM, Steve Ellcey wrote:
I'd start by looking at the code prior to reorg/delay slot scheduling. It may be the case that you're running into the well known issue that when reorg knows nothing about latency/scheduling issues and happily picks whatever insn can safely fill the delay slot. In doing so, reorg may muck up the schedule badly.With the current top-of-tree we now generate: addiu $4,$4,1 $L8: lbu $3,-1($4) addiu $5,$5,1 beq $3,$0,$L7 lbu $2,-1($5) # This is a branch delay slot beq $3,$2,$L8 addiu $4,$4,1 # This is a branch delay slot subu $2,$3,$2 # Done only once now after exiting loop. The main problem with the new loop is that the beq comparing $2 and $3 is right before the load of $2 so there can be a delay due to the time that the load takes. The ideal code would probably be:
If that's the case you might test disallowing operations with > 1 cycle latency in delay slots and see how that effects a wider range of benchmarks.
Jeff
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |