[PATCH] [ARM] PR68532: Fix VUZP and VZIP recognition on big endian

Kyrill Tkachov kyrylo.tkachov@foss.arm.com
Mon Feb 1 17:14:00 GMT 2016


On 16/12/15 17:44, Charles Baylis wrote:
> Hi

Hi Charles,
sorry for the delay on this one.

>
> This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP
> and VZIP on armeb-* targets. It also fixes the definition of the
> vuzpq_* and vzipq_*  NEON intrinsics which use incorrect lane
> specifiers in the use of __builtin_shuffle().
>
> The problem with arm_neon.h can be seen by temporarily altering
> arm_expand_vec_perm_const_1() to unconditionally return false. If this
> is done, the vuzp/vzip tests in the advsimd execution tests will fail.
> With these patches, this is no longer the case.
>
> The problem is caused by the weird mapping of architectural lane order
> to gcc lane order in big endian. For 64 bit vectors, the order is
> simply reversed, but 128 bit vectors are treated as 2 64 bit vectors
> where the lane ordering is reversed inside those. This is due to the
> memory ordering defined by the EABI. There is a large comment in
> gcc/config/arm.c above output_move_neon() which describes this in more
> detail.
>
> The arm_evpc_neon_vuzp() and  arm_evpc_neon_vzip() functions do not
> allow for this lane order, instead treating the lane order as simply
> reversed in 128 bit vectors. These patches fix this. I have included a
> test case for vuzp, but I don't have one for vzip.
>
> Tested with make check on arm-unknown-linux-gnueabihf with no regressions
> Tested with make check on armeb-unknown-linux-gnueabihf. Some
> gcc.dg/vect tests fail due to no longer being vectorized. I haven't
> analysed these, but it is expected since vuzp is not usable for the
> shuffle patterns for which it was previously used. There are also a
> few new PASSes.

Indeed I see the new passes on armeb-none-eabi.
However, the new FAILs that I see are ICEs, not just vectorisation failures,
so they need to be looked at.

The ICEs that I see are:
FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (internal compiler error)

The backtrace looks like:
0x81c9eb expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
         $SRC/gcc/expr.c:9239
0x8044cc expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
         $SRC/gcc/expr.c:9562
0x80a851 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
         $SRC/gcc/expr.c:7947
0x811bf0 store_expr_with_bounds(tree_node*, rtx_def*, int, bool, bool, tree_node*)
         $SRC/gcc/expr.c:5406
0x814a7f expand_assignment(tree_node*, tree_node*, bool)
         $SRC/gcc/expr.c:5175
0x709da5 expand_gimple_stmt_1
         $SRC/gcc/cfgexpand.c:3606
0x709da5 expand_gimple_stmt
         $SRC/gcc/cfgexpand.c:3702
0x70c3a6 expand_gimple_basic_block
         $SRC/gcc/cfgexpand.c:5708
0x70fd58 execute
         $SRC/gcc/cfgexpand.c:6323
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

Seems that the code in expr.c asserts that expand_vec_perm returned a non-NULL result.

I'll look at the patches in more detail, but in the meantime I notice that there are some
GNU style issues that should be resolved, like starting comments with a capital letter,
two spaces after full stop, two spaces between full stop and close comment, as well as some
lines over 80 characters. The check_GNU_style.sh script in the contrib/ directory can help
catch some (if not all) of these.

Also, can you please send any follow-up versions of the two patches as separate emails,
so that we can more easily keep track of what's comment goes to which patch.

Thanks,
Kyrill

>
> Patch 1 (vuzp):
>
> gcc/ChangeLog:
>
> 2015-12-15  Charles Baylis  <charles.baylis@linaro.org>
>
>          * config/arm/arm.c (arm_neon_endian_lane_map): New function.
>          (arm_neon_vector_pair_endian_lane_map): New function.
>          (arm_evpc_neon_vuzp): Allow for big endian lane order.
>          * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
>          endian.
>          (vuzpq_s16): Likewise.
>          (vuzpq_s32): Likewise.
>          (vuzpq_f32): Likewise.
>          (vuzpq_u8): Likewise.
>          (vuzpq_u16): Likewise.
>          (vuzpq_u32): Likewise.
>          (vuzpq_p8): Likewise.
>          (vuzpq_p16): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2015-12-15  Charles Baylis  <charles.baylis@linaro.org>
>
>          * gcc.c-torture/execute/pr68532.c: New test.
>
>
> Patch 2 (vzip)
>
> gcc/ChangeLog:
>
> 2015-12-15  Charles Baylis  <charles.baylis@linaro.org>
>
>          * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
>          order.
>          * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
>          endian.
>          (vzipq_s16): Likewise.
>          (vzipq_s32): Likewise.
>          (vzipq_f32): Likewise.
>          (vzipq_u8): Likewise.
>          (vzipq_u16): Likewise.
>          (vzipq_u32): Likewise.
>          (vzipq_p8): Likewise.
>          (vzipq_p16): Likewise.



More information about the Gcc-patches mailing list