This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, ARM, RFC] Fix vect.exp failures for NEON in big-endian mode


On Wed, Feb 27, 2013 at 6:29 PM, Julian Brown <julian@codesourcery.com> wrote:
> Hi,
>
> Several new (ish?) autovectorizer features have apparently caused NEON
> support for same to regress quite heavily in big-endian mode. This
> patch is an attempt to fix things up, but is not without problems --
> maybe someone will have a suggestion as to how we should proceed.
>
> The problem (as ever) is that the ARM backend must lie to the
> middle-end about the layout of NEON vectors in big-endian mode (due to
> ABI requirements, VFP compatibility, and the middle-end semantics of
> vector indices being equivalent to those of an array with the same type
> of elements when stored in memory).

Why not simply give up?  Thus, make autovectorization unsupported for
ARM big-endian targets?

Do I understand correctly that the "only" issue is memory vs. register
element ordering?  Thus a fixup could be as simple as extra shuffles
inserted after vector memory loads and before vector memory stores?
(with the hope of RTL optimizers optimizing those)?

Any "lies" are of course bad and you'll pay for them later.

Richard.

> A few years ago when the vectorizer
> was relatively less sophisticated, the ordering of vector elements
> could be ignored to some extent by disabling certain instruction
> patterns used by the vectorizer in big-endian mode which were sensitive
> to the ordering of elements: in fact this is still the strategy we're
> using, but it is clearly becoming less and less tenable as time
> progresses. Quad-word registers (being composed of two double-word
> registers, loaded/stored the "wrong way round" in big-endian mode)
> arguably cause more problems than double-word registers.
>
> So, the idea behind the attached patch was supposed to be to limit the
> autovectorizer to using double-word registers only, and to disable a
> few additional (or newly-used by the vectorizer) patterns in big-endian
> mode. That, plus several testsuite tweaks, gets us down to zero
> failures for vect.exp, which is good.
>
> The problem is that at the same time quite a large set of neon.exp tests
> regress (vzip/vuzp/vtrn): one of the new patterns which is
> disabled because it causes trouble (i.e. execution failures) for the
> vectorizer is vec_perm_const<mode>. However __builtin_shuffle (which
> uses that pattern) is used for arm_neon.h now -- so disabling it means
> that the proper instructions aren't generated for intrinsics any more in
> big-endian mode.
>
> I think we have a problem here. The vectorizer also tries to use
> __builtin_shuffle (for scatter/gather operations, when lane
> loading/storing ops aren't available), but does not understand the
> "special tweaks" that arm_evpc_neon_{vuzp,vzip,vtrn} does to try to
> hide the true element ordering of vectors from the middle-end. So, I'm
> left wondering:
>
>  * Given our funky element ordering in BE mode, are the
>    __builtin_shuffle lists in arm_neon.h actually an accurate
>    representation of what the given intrinsic should do? (The fallback
>    code might or might not do the same thing, I'm not sure.)
>
>  * The vectorizer tries to use VEC_PERM_EXPR (equivalent to
>    __builtin_shuffle) with e.g. pairs of doubleword registers loaded
>    from adjacent memory locations. Are the semantics required for this
>    (again, with our funky element ordering) even the same as those
>    required for the intrinsics? Including quad-word registers for the
>    latter? (My suspicion is "no", in which case there's a fundamental
>    incompatibility here that needs to be resolved somehow.)
>
> Anyway: the tl;dr is "fixing NEON vect tests breaks intrinsics". Any
> ideas for what to do about that? (FAOD, I don't think I'm in a position
> to do the kind of middle-end surgery required to fix the problem
> "properly" at this point :-p).
>
> (It's arguably more important for the vectorizer to not generate bad
> code than it is for intrinsics to work properly, in which case: OK to
> apply? Tested cross to ARM EABI with configury modifications to build
> LE/BE multilibs.)
>
> Thanks,
>
> Julian
>
> ChangeLog
>
>     gcc/
>     * config/arm/arm.c (arm_array_mode_supported_p): No array modes for
>     big-endian NEON.
>     (arm_preferred_simd_mode): Always prefer 64-bit modes for
>     big-endian NEON.
>     (arm_autovectorize_vector_sizes): Use 8-byte vectors only for NEON.
>     (arm_vectorize_vec_perm_const_ok): No permutations are OK in
>     big-endian mode.
>     * config/arm/neon.md (vec_load_lanes<mode><mode>): Disable in
>     big-endian mode.
>     (vec_store_lanes<mode><mode>, vec_load_lanesti<mode>)
>     (vec_load_lanesoi<mode>, vec_store_lanesti<mode>)
>     (vec_store_lanesoi<mode>, vec_load_lanesei<mode>)
>     (vec_load_lanesci<mode>, vec_store_lanesei<mode>)
>     (vec_store_lanesci<mode>, vec_load_lanesxi<mode>)
>     (vec_store_lanesxi<mode>): Likewise.
>     (vec_widen_<US>shiftl_lo_<mode>, vec_widen_<US>shiftl_hi_<mode>)
>     (vec_widen_<US>mult_hi_<mode>, vec_widen_<US>mult_lo_<mode>):
>     Likewise.
>
>     gcc/testsuite/
>     * gcc.dg/vect/slp-cond-3.c: XFAIL for !vect_unpack.
>     * gcc.dg/vect/slp-cond-4.c: Likewise.
>     * gcc.dg/vect/vect-1.c: Likewise.
>     * gcc.dg/vect/vect-1-big-array.c: Likewise.
>     * gcc.dg/vect/vect-35.c: Likewise.
>     * gcc.dg/vect/vect-35-big-array.c: Likewise.
>     * gcc.dg/vect/bb-slp-11.c: Likewise.
>     * gcc.dg/vect/bb-slp-26.c: Likewise.
>     * gcc.dg/vect/vect-over-widen-3-big-array.c: XFAIL
>     for !vect_element_align.
>     * gcc.dg/vect/vect-over-widen-1.c: Likewise.
>     * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
>     * gcc.dg/vect/vect-over-widen-2.c: Likewise.
>     * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
>     * gcc.dg/vect/vect-over-widen-3.c: Likewise.
>     * gcc.dg/vect/vect-over-widen-4.c: Likewise.
>     * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
>     * gcc.dg/vect/pr43430-2.c: Likewise.
>     * gcc.dg/vect/vect-widen-shift-u16.c: XFAIL for !vect_widen_shift
>     && !vect_unpack.
>     * gcc.dg/vect/vect-widen-shift-s8.c: Likewise.
>     * gcc.dg/vect/vect-widen-shift-u8.c: Likewise.
>     * gcc.dg/vect/vect-widen-shift-s16.c: Likewise.
>     * gcc.dg/vect/vect-93.c: Only run if !vect_intfloat_cvt.
>     * gcc.dg/vect/vect-intfloat-conversion-4a.c: Only run if
>     vect_unpack.
>     * gcc.dg/vect/vect-intfloat-conversion-4b.c: Likewise.
>     * lib/target-supports.exp (check_effective_target_vect_perm): Only
>     enable for NEON little-endian.
>     (check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
>     (check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
>     (check_effective_target_vect_widen_mult_hi_to_si): Likewise.
>     (check_effective_target_vect_widen_shift): Likewise.
>     (check_effective_target_vect_extract_even_odd): Likewise.
>     (check_effective_target_vect_interleave): Likewise.
>     (check_effective_target_vect_stridedN): Likewise.
>     (check_effective_target_vect_multiple_sizes): Likewise.
>     (check_effective_target_vect64): Enable for any NEON.
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]