This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: ARM 64-bit shifts in NEON

From: Andrew Stubbs <andrew_stubbs at mentor dot com>
To: Richard Earnshaw <rearnsha at arm dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Andrew Stubbs <ams at codesourcery dot com>
Date: Wed, 07 Dec 2011 14:03:55 +0000
Subject: Re: RFC: ARM 64-bit shifts in NEON
References: <4ED8C7B0.9000308@codesourcery.com> <4EDF6D4D.50003@arm.com>

On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote:

So it looks like the code generated for core registers with thumb2 is
pretty rubbish (no real surprise there -- to get the best code you need
to make use of the fact that on ARM a shift by a small negative number
(<  -128) will give zero.  This gives us sequences like:

For ARM state it's something like (untested)

					@ shft<  32			, shft>= 32
__ashldi3_v3:
	sub	r3, r2, #32		@ -ve            		, shft - 32
	lsl	ah, ah, r2		@ ah<<  shft     		, 0
	rsb	ip, r2, #32		@ 32 - shft      		, -ve
	orr	ah, ah, al, lsl r3	@ ah<<  shft     		, al<<  shft - 32
	orr	ah, ah, al, lsr ip	@ ah<<  shft | al>>  32 - shft	, al<<  shft - 32
	lsl	al, al, r2		@ al<<  shft     		, 0

For Thumb2 (where there is no orr with register shift)

	lsls	ah, ah, r2		@ ah<<  shft     		, 0
	sub	r3, r2, #32		@ -ve            		, shft - 32
	lsl	ip, al, r3		@ 0              		, al<<  shft - 32
	negs	r3, r3			@ 32 - shft      		, -ve
	orr	ah, ah, ip		@ ah<<  shft     		, al<<  shft - 32
	lsr	r3, al, r3		@ al>>  32 - shft		, 0
	orrs	ah, ah, r3		@ ah<<  shft | al>>  32 - shft	, al<<  shft - 32
	lsls	al, al, r2		@ al<<  shft     		, 0

Neither of which needs the condition flags during execution (and indeed
is probably better in both cases than the code currently in lib1funcs.asm
for a modern core).  The flag clobbering behaviour in the thumb2 variant
is only for code size saving; that would normally be added by a late
optimization pass.

OK, those are interesting, and I can look into making it happen, with or without NEON.

Would it not require an unspec to prevent 'clever things' happening to the negative shift, if we were to encode these in the machine description? I'm not too clear on what these 'clever things' might be in the case of shift-by-register (presumably value-range propagation is one), but I know the NEON shifts are encoded this way for safety.

None of this directly helps with your neon usage, but it does show that we
really don't need to clobber the condition code register to get an
efficient sequence.

Except that it doesn't in the case of a shift by one where there is a two-instruction sequence that clobbers CC. Presumably this special case can be treated differently though, right from expand.

Andrew

Follow-Ups:
- Re: RFC: ARM 64-bit shifts in NEON
  - From: Richard Earnshaw

References:
- RFC: ARM 64-bit shifts in NEON
  - From: Andrew Stubbs
- Re: RFC: ARM 64-bit shifts in NEON
  - From: Richard Earnshaw

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]