This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue5488054)

From: Sriraman Tallam <tmsriram at google dot com>
To: Richard Henderson <rth at redhat dot com>
Cc: reply at codereview dot appspotmail dot com, davidxl at google dot com, gcc-patches at gcc dot gnu dot org
Date: Tue, 13 Dec 2011 10:26:38 -0800
Subject: Re: [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue5488054)
References: <20111213020557.EA8F1B21AC@azwildcat.mtv.corp.google.com> <4EE79245.4000903@redhat.com>

On Tue, Dec 13, 2011 at 9:58 AM, Richard Henderson <rth@redhat.com> wrote:
> On 12/12/2011 06:05 PM, Sriraman Tallam wrote:
>> On core2, unaligned vector load/store using movdqu is a very slow operation.
>> Experiments show it is six times slower than movdqa (aligned) and this is
>> irrespective of whether the resulting data happens to be aligned or not.
>> For Corei7, there is no performance difference between the two and on AMDs,
>> movdqu is only about 10% slower.
>>
>> This patch does not vectorize loops that need to generate the slow unaligned
>> memory load/stores on core2.
>
> What happens if you temporarily disable
>
> ? ? ?/* ??? Similar to above, only less clear because of quote
> ? ? ? ? typeless stores unquote. ?*/
> ? ? ?if (TARGET_SSE2 && !TARGET_SSE_TYPELESS_STORES
> ? ? ? ? ?&& GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> ? ? ? ?{
> ? ? ? ? ?op0 = gen_lowpart (V16QImode, op0);
> ? ? ? ? ?op1 = gen_lowpart (V16QImode, op1);
> ? ? ? ? ?emit_insn (gen_sse2_movdqu (op0, op1));
> ? ? ? ? ?return;
> ? ? ? ?}
>
> so that the unaligned store happens via movlps + movhps?

Cool, this works for stores!  It generates the movlps + movhps. I have
to also make a similar change to another call to gen_sse2_movdqu for
loads. Would it be ok to not do this when tune=core2?

Thanks,
-Sri.


>
>
> r~

Follow-Ups:
- Re: [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue5488054)
  - From: Richard Henderson

References:
- [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue5488054)
  - From: Sriraman Tallam
- Re: [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue5488054)
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]