aarch64: Reimplement vshrn_n* intrinsics using builtins
This patch reimplements the vshrn_n* intrinsics to use RTL builtins.
These perform a narrowing right shift.
Although the intrinsic generates the half-width mode (e.g. V8HI ->
V8QI), the new pattern generates a full 128-bit mode (V8HI -> V16QI) by representing the
fill-with-zeroes semantics of the SHRN instruction. The narrower (V8QI) result is extracted with a
lowpart subreg.
I found this allows the RTL optimisers to do a better job at optimising
redundant moves away in frequently-occurring SHRN+SRHN2 pairs, like in:
uint8x16_t
foo (uint16x8_t in1, uint16x8_t in2)
{
uint8x8_t tmp = vshrn_n_u16 (in2, 7);
uint8x16_t tmp2 = vshrn_high_n_u16 (tmp, in1, 4);
return tmp2;
}