This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue5488054)
- From: Richard Henderson <rth at redhat dot com>
- To: Sriraman Tallam <tmsriram at google dot com>
- Cc: reply at codereview dot appspotmail dot com, davidxl at google dot com, gcc-patches at gcc dot gnu dot org
- Date: Tue, 13 Dec 2011 09:58:29 -0800
- Subject: Re: [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue5488054)
- References: <20111213020557.EA8F1B21AC@azwildcat.mtv.corp.google.com>
On 12/12/2011 06:05 PM, Sriraman Tallam wrote:
> On core2, unaligned vector load/store using movdqu is a very slow operation.
> Experiments show it is six times slower than movdqa (aligned) and this is
> irrespective of whether the resulting data happens to be aligned or not.
> For Corei7, there is no performance difference between the two and on AMDs,
> movdqu is only about 10% slower.
>
> This patch does not vectorize loops that need to generate the slow unaligned
> memory load/stores on core2.
What happens if you temporarily disable
/* ??? Similar to above, only less clear because of quote
typeless stores unquote. */
if (TARGET_SSE2 && !TARGET_SSE_TYPELESS_STORES
&& GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
{
op0 = gen_lowpart (V16QImode, op0);
op1 = gen_lowpart (V16QImode, op1);
emit_insn (gen_sse2_movdqu (op0, op1));
return;
}
so that the unaligned store happens via movlps + movhps?
r~