This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: ARM 64-bit shifts in NEON

From: Andrew Stubbs <ams at codesourcery dot com>
To: Richard Earnshaw <rearnsha at arm dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Wed, 07 Dec 2011 14:36:06 +0000
Subject: Re: RFC: ARM 64-bit shifts in NEON
References: <4ED8C7B0.9000308@codesourcery.com> <4EDF6D4D.50003@arm.com> <4EDF724B.5050804@mentor.com> <4EDF763B.1030409@arm.com>

On Wed 07 Dec 2011 14:20:43 GMT, Richard Earnshaw wrote:

Would it not require an unspec to prevent 'clever things' happening to
the negative shift, if we were to encode these in the machine
description? I'm not too clear on what these 'clever things' might be in
the case of shift-by-register (presumably value-range propagation is
one), but I know the NEON shifts are encoded this way for safety.


Given the way the shift patterns in the compiler are written today, quite possibly.  Though in the
general case of a non-constant shift the optimizer probably wouldn't be able to safely make any
assumptions that would break things.

I've noticed that the right-shift NEON insns have an "EQUIV" entry in the dumps, so I'm suspicious that even then we're not totally safe from "optimization".

I suspect that the shift patterns should really be changed to make the shift be by a QImode value;
this would then correctly describe the number of bits in the register that are really involved in
the shift.  Further, we could then say that, for core registers, the full value in that QI register
was used to determine the shift.  It would be quite a lot of churn to fix this though.

Yeah, I considered this, but I couldn't figure out how to make the details work, and anyway, I've seen the compiler trying to truncate things (unnecessarily) for that sort of thing, so I haven't actually tried it.

Maybe I'll have a go and see what happens. I suspect it's need extra patterns to combine the truncate seamlessly, and allow actual QImode input also?

None of this directly helps with your neon usage, but it does show that we
really don't need to clobber the condition code register to get an
efficient sequence.


Except that it doesn't in the case of a shift by one where there is a
two-instruction sequence that clobbers CC. Presumably this special case
can be treated differently though, right from expand.


All of the sequences above can be simplified significantly if the shift amount is constant and I
think then, that with the exception of the special case you mention (which is only for shift right
by 1) you never need the condition codes and you never need more than 3 ARM instructions:

Actually, there are "1bit" patterns for all the shift types.

shifts< 32

LSL	AH, AH, #n
ORR	AH, AH, AL, LSR #(32 - n)
LSL	AL, AL, #n

shifts>= 32
LSL	AH, AL, #(n - 32)
MOV	AL, #0

In fact both of the above sequences are equally good for Thumb2.  If we lost the RRX tweak it
wouldn't be a major loss (we could even put it back as a peephole2 to handle the common case where
the condition code registers were known to be dead).

Yes, these are what the compiler currently generates. With my patch, *sh*di3 never fails to expand (not if TARGET_NEON is true, anyway), so the compiler doesn't do it automatically any more, so I have added splitters to do it manually.

Andrew

References:
- RFC: ARM 64-bit shifts in NEON
  - From: Andrew Stubbs
- Re: RFC: ARM 64-bit shifts in NEON
  - From: Richard Earnshaw
- Re: RFC: ARM 64-bit shifts in NEON
  - From: Andrew Stubbs
- Re: RFC: ARM 64-bit shifts in NEON
  - From: Richard Earnshaw

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]