[RFC] try to generate FP and/or/xor instructions for SSE

Ross Ridge rridge@csclub.uwaterloo.ca
Thu Aug 23 16:21:00 GMT 2007


[Blah, accidentally sent this to the wrong list the first time.
Sorry about that.]

Richard Guenther writes:
>As I said - at least for AMD CPUs - it looks like you can freely
>interchange the ps|pd or integer variants of the bitwise and/or
>operations without a penalty.

An example in AMD's "Software Optmization Guide for AMD64 Processors"
suggests that you can't freely interchange them.  In the example it
gives for using XOR to negate a double-precision vector, it uses XORPD.
If PXOR, XORPS and XORPD were all interchangable, it should have used
XORPS since it's a byte shorter than XORPD.

The guide also says:

	When it is necessary to zero out an XMM register, use an
	instruction whose format matches the format required by the
	consumers of the zeroed register.

	...

	When an XMM register must be set to zero, using the appropriate
	instruction helps reduce the chance of any performance penalty
	later.

This advice differs from Intel's, which on Pentium 4 processors recommends
always using PXOR to clear XMM registers, as that instruction breaks
dependency chains, while the XORPS and XORPD instructions don't.
Only the newer Intel Core processors support breaking chains with all
three instructions.

					Ross Ridge



More information about the Gcc-patches mailing list