This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question about PR 48814 and ivopts and post-increment


On 12/01/2015 02:11 PM, Steve Ellcey  wrote:

With the current top-of-tree we now generate:

	addiu	$4,$4,1
$L8:
	lbu	$3,-1($4)
	addiu	$5,$5,1
	beq	$3,$0,$L7
	lbu	$2,-1($5)  # This is a branch delay slot
	beq	$3,$2,$L8
	addiu	$4,$4,1    # This is a branch delay slot

	subu	$2,$3,$2   # Done only once now after exiting loop.

The main problem with the new loop is that the beq comparing $2 and $3
is right before the load of $2 so there can be a delay due to the time
that the load takes.  The ideal code would probably be:
I'd start by looking at the code prior to reorg/delay slot scheduling. It may be the case that you're running into the well known issue that when reorg knows nothing about latency/scheduling issues and happily picks whatever insn can safely fill the delay slot. In doing so, reorg may muck up the schedule badly.

If that's the case you might test disallowing operations with > 1 cycle latency in delay slots and see how that effects a wider range of benchmarks.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]