This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads


On Wed, Oct 30, 2013 at 10:53:58AM +0100, OndÅej BÃlka wrote:
> > Yesterday I've noticed that for AVX which allows unaligned operands in
> > AVX arithmetics instructions we still don't combine unaligned loads with the
> > AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize
> > void
> > f1 (int *__restrict e, int *__restrict f)
> > {
> >   int i;
> >   for (i = 0; i < 1024; i++)
> >     e[i] = f[i] * 7;
> > }
> > 
> > void
> > f2 (int *__restrict e, int *__restrict f)
> > {
> >   int i;
> >   for (i = 0; i < 1024; i++)
> >     e[i] = f[i];
> > }
> > we have:
> >         vmovdqu (%rsi,%rax), %xmm0
> >         vpmulld %xmm1, %xmm0, %xmm0
> >         vmovups %xmm0, (%rdi,%rax)
> > in the first loop.  Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
> > *mov<mode>_internal patterns (and various others) use misaligned_operand
> > to see if they should emit vmovaps or vmovups (etc.), so as suggested by
> 
> That is intentional. In pre-haswell architectures splitting load is
> faster than 32 byte load. 

But the above is 16 byte unaligned load.  Furthermore, GCC supports
-mavx256-split-unaligned-load and can emit 32 byte loads either as an
unaligned 32 byte load, or merge of 16 byte unaligned loads.  The patch
affects only the cases where we were already emitting 16 byte or 32 byte
unaligned loads rather than split loads.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]