This is the mail archive of the
mailing list for the GCC project.
Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Richard Henderson <rth at redhat dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 30 Oct 2013 11:55:44 +0100
- Subject: Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads
- Authentication-results: sourceware.org; auth=none
- References: <20131030094713 dot GC27813 at tucnak dot zalov dot cz>
On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek <firstname.lastname@example.org> wrote:
> Yesterday I've noticed that for AVX which allows unaligned operands in
> AVX arithmetics instructions we still don't combine unaligned loads with the
> AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize
This is actually PR 47754 that fell below radar for some reason...
> we have:
> vmovdqu (%rsi,%rax), %xmm0
> vpmulld %xmm1, %xmm0, %xmm0
> vmovups %xmm0, (%rdi,%rax)
> in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
> *mov<mode>_internal patterns (and various others) use misaligned_operand
> to see if they should emit vmovaps or vmovups (etc.), so as suggested by
> Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory
> operands of all the various non-move AVX instructions for TARGET_AVX, or
> add extra patterns to help combine, this patch instead just uses the
> *mov<mode>_internal in that case (assuming initially misaligned_operand
> doesn't become !misaligned_operand through RTL optimizations). Additionally
No worries here. We will generate movdqa, and it is definitely a gcc
bug if RTL optimizations change misaligned operand to aligned.
> the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
> loads, which usually means combine will fail, by doing the load into a
> temporary pseudo in that case and then doing a pseudo to pseudo move with
> gen_lowpart on the rhs (which will be merged soon after into following
Is this similar to PR44141? There were similar problems with V4SFmode
subregs, so combine was not able to merge load to the arithemtic insn.
> I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my
> bootstrap/regtest server isn't AVX capable.
I can bootstrap the patch later today on IvyBridge with
--with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.