[PATCH ARM] PR/61062 Fix arm_neon.h ZIP/UZP/TRN for bigendian
Alan Lawrence
alan.lawrence@arm.com
Wed May 14 13:52:00 GMT 2014
Hi,
Due to differences in how the ARM C Language Extensions and gcc's vector
extensions deal with indices within vectors, the __builtin_shuffle masks used to
implement the ZIP, UZP and TRN Neon Intrinsics in arm_neon.h are correct only
for little-endian. (The problem on bigendian has recently been revealed by new
tests in gcc.target/arm/simd/.)
This patch corrects the indices using "#ifdef __ARM_BIG_ENDIAN" through
arm_neon.h. I've tested all the arm-specific tests (arm.exp acle.exp aapcs.exp
simd.exp neon.exp) on both arm-none-eabi and armeb-none-eabi and there are no
regressions, and on armeb-none-eabi this patch fixes FAIL -> PASS for
simd/v{uzp,zip,trn}*_1.c.
Note the patch also modifies gcc.target/arm/pr48252.c. A bit of diving into the
history of this test reveals
*the test was first written in the days when the arm_neon.h implementation
used builtins such as __builtin_neon_vzipv8qi (which were thus correct for
bigendian).
*In SVN rev 189294, ZIP intrinsics were rewritten to use __builtin_shuffle
(with little-endian masks); this broke pr48252.c on bigendian, but this was not
detected until...
*In SVN rev 191200, in which pr48252.c was modified to expect different
results according to endianness - that is, "fixing" the test to match the broken
implementation. (I have verified that this updated test failed on the original
__builtin_neon_vzipv8qi implementation.)
The fix to pr48252.c here largely reverts to the original form although keeps
the (correct, proper) use of vld1.
gcc/ChangeLog:
* config/arm/arm_neon.h (vtrn_s8, vtrn_s16, vtrn_u8, vtrn_u16, vtrn_p8,
vtrn_p16, vtrn_s32, vtrn_f32, vtrn_u32, vtrnq_s8, vtrnq_s16, vtrnq_s32,
vtrnq_f32, vtrnq_u8, vtrnq_u16, vtrnq_u32, vtrnq_p8, vtrnq_p16, vzip_s8,
vzip_s16, vzip_u8, vzip_u16, vzip_p8, vzip_p16, vzip_s32, vzip_f32,
vzip_u32, vzipq_s8, vzipq_s16, vzipq_s32, vzipq_f32, vzipq_u8,
vzipq_u16, vzipq_u32, vzipq_p8, vzipq_p16, vuzp_s8, vuzp_s16, vuzp_s32,
vuzp_f32, vuzp_u8, vuzp_u16, vuzp_u32, vuzp_p8, vuzp_p16, vuzpq_s8,
vuzpq_s16, vuzpq_s32, vuzpq_f32, vuzpq_u8, vuzpq_u16, vuzpq_u32,
vuzpq_p8, vuzpq_p16): Correct mask for bigendian.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pr48252.c (main): Expect same result as endian-neutral.
Cheers, Alan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arm_neon_bigendian.patch
Type: text/x-patch
Size: 44892 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20140514/5244abd2/attachment.bin>
More information about the Gcc-patches
mailing list