This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Richard Henderson <rth at redhat dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 30 Oct 2013 11:55:44 +0100
- Subject: Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads
- Authentication-results: sourceware.org; auth=none
- References: <20131030094713 dot GC27813 at tucnak dot zalov dot cz>
On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Yesterday I've noticed that for AVX which allows unaligned operands in
> AVX arithmetics instructions we still don't combine unaligned loads with the
> AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize
This is actually PR 47754 that fell below radar for some reason...
> we have:
> vmovdqu (%rsi,%rax), %xmm0
> vpmulld %xmm1, %xmm0, %xmm0
> vmovups %xmm0, (%rdi,%rax)
> in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
> *mov<mode>_internal patterns (and various others) use misaligned_operand
> to see if they should emit vmovaps or vmovups (etc.), so as suggested by
> Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory
> operands of all the various non-move AVX instructions for TARGET_AVX, or
> add extra patterns to help combine, this patch instead just uses the
> *mov<mode>_internal in that case (assuming initially misaligned_operand
> doesn't become !misaligned_operand through RTL optimizations). Additionally
No worries here. We will generate movdqa, and it is definitely a gcc
bug if RTL optimizations change misaligned operand to aligned.
> the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
> loads, which usually means combine will fail, by doing the load into a
> temporary pseudo in that case and then doing a pseudo to pseudo move with
> gen_lowpart on the rhs (which will be merged soon after into following
> instructions).
Is this similar to PR44141? There were similar problems with V4SFmode
subregs, so combine was not able to merge load to the arithemtic insn.
> I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my
> bootstrap/regtest server isn't AVX capable.
I can bootstrap the patch later today on IvyBridge with
--with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.
Uros.