This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads

From: Uros Bizjak <ubizjak at gmail dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: Richard Henderson <rth at redhat dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Wed, 30 Oct 2013 11:55:44 +0100
Subject: Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads
Authentication-results: sourceware.org; auth=none
References: <20131030094713 dot GC27813 at tucnak dot zalov dot cz>

On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:

> Yesterday I've noticed that for AVX which allows unaligned operands in
> AVX arithmetics instructions we still don't combine unaligned loads with the
> AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize

This is actually PR 47754 that fell below radar for some reason...

> we have:
>         vmovdqu (%rsi,%rax), %xmm0
>         vpmulld %xmm1, %xmm0, %xmm0
>         vmovups %xmm0, (%rdi,%rax)
> in the first loop.  Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
> *mov<mode>_internal patterns (and various others) use misaligned_operand
> to see if they should emit vmovaps or vmovups (etc.), so as suggested by
> Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory
> operands of all the various non-move AVX instructions for TARGET_AVX, or
> add extra patterns to help combine, this patch instead just uses the
> *mov<mode>_internal in that case (assuming initially misaligned_operand
> doesn't become !misaligned_operand through RTL optimizations).  Additionally

No worries here. We will generate movdqa, and it is definitely a gcc
bug if RTL optimizations change misaligned operand to aligned.

> the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
> loads, which usually means combine will fail, by doing the load into a
> temporary pseudo in that case and then doing a pseudo to pseudo move with
> gen_lowpart on the rhs (which will be merged soon after into following
> instructions).

Is this similar to PR44141? There were similar problems with V4SFmode
subregs, so combine was not able to merge load to the arithemtic insn.

> I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my
> bootstrap/regtest server isn't AVX capable.

I can bootstrap the patch later today on IvyBridge with
--with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.

Uros.

Follow-Ups:
- Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads
  - From: Jakub Jelinek

References:
- [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]