[PATCH][AArch64]Add vec_shr pattern for 64-bit vectors using ush{l,r}; enable tests.
Alan Lawrence
alan.lawrence@arm.com
Fri Nov 14 15:43:00 GMT 2014
Following recent vectorizer changes to reductions via shifts, AArch64 will now
reduce loops such as this
unsigned char in[8] = {1, 3, 5, 7, 9, 11, 13, 15};
int
main (unsigned char argc, char **argv)
{
unsigned char prod = 1;
/* Prevent constant propagation of the entire loop below. */
asm volatile ("" : : : "memory");
for (unsigned char i = 0; i < 8; i++)
prod *= in[i];
if (prod != 17)
__builtin_printf("Failed %d\n", prod);
return 0;
}
using an 'ext' instruction from aarch64_expand_vec_perm_const:
main:
adrp x0, .LANCHOR0
movi v2.2s, 0 <=== note reg used here
ldr d1, [x0, #:lo12:.LANCHOR0]
ext v0.8b, v1.8b, v2.8b, #4
mul v1.8b, v1.8b, v0.8b
ext v0.8b, v1.8b, v2.8b, #2
mul v0.8b, v1.8b, v0.8b
ext v2.8b, v0.8b, v2.8b, #1
mul v0.8b, v0.8b, v2.8b
umov w1, v0.b[0]
The 'ext' works for both 64-bit vectors, and 128-bit vectors; but for 64-bit
vectors, we can do slightly better using ushr; this patch improves the above to:
main:
adrp x0, .LANCHOR0
ldr d0, [x0, #:lo12:.LANCHOR0]
ushr d1, d0, 32
mul v0.8b, v0.8b, v1.8b
ushr d1, d0, 16
mul v0.8b, v0.8b, v1.8b
ushr d1, d0, 8
mul v0.8b, v0.8b, v1.8b
umov w1, v0.b[0]
...
Tested with bootstrap + check-gcc on aarch64-none-linux-gnu.
Cross-testing of check-gcc on aarch64_be-none-elf in progress.
Ok if no regressions on big-endian?
Cheers,
--Alan
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (vec_shr<mode>): New.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_whole_vector_shift): Add aarch64{,_be}.
More information about the Gcc-patches
mailing list