[RFC PATCH] For TARGET_AVX use *mov<mode>_internal for misaligned loads

Wed Oct 30 11:54:00 GMT 2013

On Wed, Oct 30, 2013 at 11:55:44AM +0100, Uros Bizjak wrote:
> On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> > Yesterday I've noticed that for AVX which allows unaligned operands in
> > AVX arithmetics instructions we still don't combine unaligned loads with the
> > AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize
> 
> This is actually PR 47754 that fell below radar for some reason...

Apparently yes.

> > the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
> > loads, which usually means combine will fail, by doing the load into a
> > temporary pseudo in that case and then doing a pseudo to pseudo move with
> > gen_lowpart on the rhs (which will be merged soon after into following
> > instructions).
> 
> Is this similar to PR44141? There were similar problems with V4SFmode
> subregs, so combine was not able to merge load to the arithemtic insn.

>From the work on the vectorization last year I remember many cases where
subregs (even equal size) on the LHS of instructions prevented combiner or
other RTL optimizations from doing it's job.  I believe I've changed some
easy places that did that completely unnecessarily, but certainly have not
went through all the code to look for other places where this is done.

Perhaps let's hack up a checking pass that will after expansion walk the
whole IL and complain about same sized subregs on the LHS of insns, then do make
check with it for a couple of ISAs (-msse2,-msse4,-mavx,-mavx2 e.g.?

> > I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my
> > bootstrap/regtest server isn't AVX capable.
> 
> I can bootstrap the patch later today on IvyBridge with
> --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.

That would be greatly appreciated, thanks.

	Jakub