This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, ARM] Misaligned access support for ARM Neon
- From: Richard Earnshaw <Richard dot Earnshaw at buzzard dot freeserve dot co dot uk>
- To: Julian Brown <julian at codesourcery dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Richard Earnshaw <rearnsha at arm dot com>, Paul Brook <paul at codesourcery dot com>, Ira Rosen <IRAR at il dot ibm dot com>
- Date: Wed, 22 Sep 2010 23:54:38 +0100
- Subject: Re: [PATCH, ARM] Misaligned access support for ARM Neon
- References: <20091117171931.053faec2@rex.config> <1259593368.6000.40.camel@e200601-lin.cambridge.arm.com> <20091218172204.3c0eab5b@rex.config> <200912211220.12265.paul@codesourcery.com> <20100518013108.5514d986@rex.config> <20100604135031.3d77a087@rex.config> <1275668777.6827.83.camel@e102346-lin.cambridge.arm.com> <20100607200848.0fe3ab59@rex.config> <20100803173200.33d079bf@rex.config>
On 03/08/10 17:32, Julian Brown wrote:
> On Mon, 7 Jun 2010 20:08:48 +0100
> Julian Brown <julian@codesourcery.com> wrote:
>
>>>>> This is a new version of the patch, which adds movmisalign
>>>>> patterns for little-endian NEON, and uses a new (since the last
>>>>> version of the patch was posted) target hook
>>>>> (TARGET_SUPPORT_VECTOR_MISALIGNMENT) to describe the alignments
>>>>> supported by NEON.
>
> The previously-posted version of this patch no longer works on current
> mainline, so here's a new version which does.
>
> Backing up to the start of the problem, since it's been a while -- this
> patch adds several things to NEON support in the ARM backend:
>
> 1. Implementations of the movmisalign pattern for loading and storing
> vectors which are not naturally aligned.
>
> 2. Constraint/operand printing tweaks to disallow pre-decrement for
> addresses used by the above, and allow printing of alignment specifiers
> for same.
>
> 3. Implementations of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT and
> TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE, to tell the middle-end
> which alignments are supported for vector loads/stores by the hardware.
>
> 4. Testsuite tweaks to specify that certain tests only require vectors
> to be aligned to the natural alignment of their elements, but not
> necessarily less than that. Also tweaks to force some tests to use the
> -mvectorize-with-neon-quad option.
>
>> [...] there's still an assumption that elements from increasing memory
>> locations go in increasing lane numbers (which is only true in
>> little-endian mode for NEON at present), but I don't think this patch
>> makes things any worse. Fixing big-endian mode is another problem for
>> another day :-).
>
> This still holds, but as previously discussed, probably should not be a
> sticking point for getting this patch applied.
>
> There remains a small amount of noise in testsuite results with this
> patch, i.e.:
>
> PASS -> FAIL: mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/ve
> ct/vect-72.c scan-tree-dump-times vect "Alignment of access forced using peeling
> " 0
>
> This fails because a loop containing both an unaligned load and an unaligned store is unpeeled, making the load aligned. It seems to be a valid thing to do, so I'm not sure why it's a failure.
>
> New FAIL: mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorized 1 loops" 1
> New FAIL: mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc scan-tree-dump-times vect "vectorizing stmts using SLP" 1
>
> These were analysed in:
>
> http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01351.html
>
> New FAIL: mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/vect/vect-outer-4c.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
>
> and this in:
>
> http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01328.html
>
> Also several tests transition from XPASS to PASS.
>
> Tested with cross to ARM Linux (-mthumb -march=armv7-a -mfpu=neon
> -mfloat-abi=softfp), gcc/g++/libstdc++. OK to apply?
>
> ChangeLog
>
> gcc/
> * expr.c (expand_assignment): Add assertion to prevent emitting null rtx for
> movmisalign pattern.
> (expand_expr_real_1): Likewise.
> * config/arm/arm.c (arm_builtin_support_vector_misalignment): New.
> (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): New. Use above.
> (arm_vector_alignment_reachable): New.
> (TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE): New. Use above.
> (neon_vector_mem_operand): Disallow PRE_DEC for misaligned loads.
> (arm_print_operand): Include alignment qualifier in %A.
> * config/arm/neon.md (UNSPEC_MISALIGNED_ACCESS): New constant.
> (movmisalign<mode>): New expander.
> (movmisalign<mode>_neon_store, movmisalign<mode>_neon_load): New
> insn patterns.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-42.c: Use vect_element_align instead of
> vect_hw_misalign.
> * gcc.dg/vect/vect-60.c: Likewise.
> * gcc.dg/vect/vect-56.c: Likewise.
> * gcc.dg/vect/vect-93.c: Likewise.
> * gcc.dg/vect/no-scevccp-outer-8.c: Likewise.
> * gcc.dg/vect/vect-95.c: Likewise.
> * gcc.dg/vect/vect-96.c: Likewise.
> * gcc.dg/vect/vect-outer-5.c: Use quad-word vectors when available.
> * gcc.dg/vect/slp-25.c: Likewise.
> * gcc.dg/vect/slp-3.c: Likewise.
> * gcc.dg/vect/vect-multitypes-1.c: Likewise.
> * gcc.dg/vect/no-vfa-pr29145.c: Likewise.
> * gcc.dg/vect/vect-multitypes-4.c: Likewise. Use vect_element_align.
> * gcc.dg/vect/vect-109.c: Likewise.
> * gcc.dg/vect/vect-peel-1.c: Likewise.
> * gcc.dg/vect/vect-peel-2.c: Likewise.
> * lib/target-supports.exp
> (check_effective_target_arm_vect_no_misalign): New.
> (check_effective_target_vect_no_align): Use above.
> (check_effective_target_vect_element_align): New.
> (add_options_for_quad_vectors): New.
I've spent a long time pondering this patch and I'm still not entirely
happy that forcing the vectorizer to pretend these operations are
unaligned is the correct way to specify this, but I must admit that I
can't see a reasonable alternative at the moment that isn't
significantly less pleasant in some other respect. So this is OK apart
from:
+ if (align_bits != 0)
+ asm_fprintf (stream, ", :%d", align_bits);
The comma is incorrect in the alignment syntax. The correct form is
[Rn:align]. That is, the ':' is a direct replacement for '@' in the
strict UAL form.
I hope I'm not going to regret this... :-)
R.