This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] teach emit_store_flag to use clz/ctz


On Sun, 6 May 2012, Andrew Pinski wrote:

> > ?For the record: MIPS processors that implement CLZ/CLO (for some reason
> > CTZ/CTO haven't been added to the architecture, but these operations can
> > be cheaply transformed into CLZ/CLO) generally have a dedicated unit that
> > causes no pipeline stall for these instructions even in the simplest
> > pipeline designs like the M4K -- IOW they are issued at the usual one
> > instruction per pipeline clock rate.
> 
> Even on Octeon this is true.  Though Octeon has seq/sneq too so it
> does not matter in the end.

 Does Octeon's pipeline qualify as simple?  For some reason I've thought 
it is a high-performance core.  The M4K is one of the smallest/simplest 
MIPS chips ever built.

 And actually all MIPS processors (back to 1985's MIPS I ISA) support 
two-instruction set-if-equal and set-if-not-equal sequences:

	xor	rd, rt, rs
	sltiu	rd, rd, 1

and:

	xor	rd, rt, rs
	sltu	rd, zero, rd

respectively, that may still be more beneficial than any possible 
alternatives, especially ones involving branches.

> Note I originally was the one who proposed this optimization for
> PowerPC even before I saw what XLC did.  See PR 10588 (which I filed 9
> years ago)  and it seems we are about to fix it soon.

 For that -- set-if-zero and set-if-non-zero -- you can use the 
instructions as above (that are supported by all MIPS processors):

	sltiu	rd, rs, 1

and

	sltu	rd, zero, rs

However GCC doesn't seem smart enough to use them well with your example.  
I'd expect something like:

	sltiu	$4, $4, 1
	sltiu	$2, $5, 1
	jr	$31
	 or	$2, $4, $2

however I get:

	beq	$4, $0, .L3
	 nop
	jr	$31
	 sltiu	$2, $5, 1
.L3:
	jr	$31
	 li	$2, 1

which is never faster and obviously not smaller either.  And there is 
really no need to avoid the second comparison as per logical OR rules here 
-- it's all in registers.

 This pessimisation is avoided for MIPS IV and more recent processors that 
have move-if-non-zero however (and the second comparison is always 
evaluated):

	sltiu	$5, $5, 1
	li	$2, 1
	jr	$31
	 movn	$2, $5, $4

Any chance to get it better with the fix you've mentioned?

  Maciej


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]