This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: On the x86_64, does one have to zero a vector register before filling it completely ?

From: Richard Guenther <richard dot guenther at gmail dot com>
To: tprince at computer dot org
Cc: Toon Moene <toon at moene dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>, gcc mailing list <gcc at gcc dot gnu dot org>
Date: Sat, 28 Nov 2009 17:56:54 +0100
Subject: Re: On the x86_64, does one have to zero a vector register before filling it completely ?
References: <4B1107A6.6010205@moene.org> <6dc9ffc80911280452j62ad033bsace217a49c8406f0@mail.gmail.com> <4B1129CE.3070604@moene.org> <4B114142.8080001@aol.com> <84fc9c000911280806m9f7b2afi2cbf8a2ef99dc048@mail.gmail.com> <4B115069.90406@aol.com>

On Sat, Nov 28, 2009 at 5:31 PM, Tim Prince <n8tm@aol.com> wrote:
> Richard Guenther wrote:
>>
>> On Sat, Nov 28, 2009 at 4:26 PM, Tim Prince <n8tm@aol.com> wrote:
>>>
>>> Toon Moene wrote:
>>>>
>>>> H.J. Lu wrote:
>>>>>
>>>>> On Sat, Nov 28, 2009 at 3:21 AM, Toon Moene <toon@moene.org> wrote:
>>>>>>
>>>>>> L.S.,
>>>>>>
>>>>>> Due to the discussion on register allocation, I went back to a hobby
>>>>>> of
>>>>>> mine: Studying the assembly output of the compiler.
>>>>>>
>>>>>> For this Fortran subroutine (note: unless otherwise told to the
>>>>>> Fortran
>>>>>> front end, reals are 32 bit floating point numbers):
>>>>>>
>>>>>> ? ?subroutine sum(a, b, c, n)
>>>>>> ? ?integer i, n
>>>>>> ? ?real a(n), b(n), c(n)
>>>>>> ? ?do i = 1, n
>>>>>> ? ? ? c(i) = a(i) + b(i)
>>>>>> ? ?enddo
>>>>>> ? ?end
>>>>>>
>>>>>> with -O3 -S (GCC: (GNU) 4.5.0 20091123), I get this (vectorized) loop:
>>>>>>
>>>>>> ? ? ?xorps ? %xmm2, %xmm2
>>>>>> ? ? ?....
>>>>>> .L6:
>>>>>> ? ? ?movaps ?%xmm2, %xmm0
>>>>>> ? ? ?movaps ?%xmm2, %xmm1
>>>>>> ? ? ?movlps ?(%r9,%rax), %xmm0
>>>>>> ? ? ?movlps ?(%r8,%rax), %xmm1
>>>>>> ? ? ?movhps ?8(%r9,%rax), %xmm0
>>>>>> ? ? ?movhps ?8(%r8,%rax), %xmm1
>>>>>> ? ? ?incl ? ?%ecx
>>>>>> ? ? ?addps ? %xmm1, %xmm0
>>>>>> ? ? ?movaps ?%xmm0, 0(%rbp,%rax)
>>>>>> ? ? ?addq ? ?$16, %rax
>>>>>> ? ? ?cmpl ? ?%ebx, %ecx
>>>>>> ? ? ?jb ? ? ?.L6
>>>>>>
>>>>>> I'm not a master of x86_64 assembly, but this strongly looks like
>>>>>> %xmm{0,1}
>>>>>> have to be zero'd (%xmm2 is set to zero by xor'ing it with itself),
>>>>>> before
>>>>>> they are completely filled with the mov{l,h}ps instructions ?
>>>>>>
>>>>> I think it is used to avoid partial SSE register stall.
>>>>>
>>>>>
>>>> You mean there's no movaps (%r9,%rax), %xmm0 (and mutatis mutandis for
>>>> %xmm1) instruction (to copy 4*32 bits to the register) ?
>>>>
>>> If you want those, you must request them with -mtune=barcelona.
>>
>> Which would then get you movups (%r9,%rax), %xmm0 (unaligned move).
>> generic tuning prefers the split moves, AMD Fam10 and above handle
>> unaligned moves just fine.
>
> Correct, the movaps would have been used if alignment were recognized.
> The newer CPUs achieve full performance with movups.
> Do you consider Core i7/Nehalem as included in "AMD Fam10 and above?"

I'd have to consult the optimization manual of those, but HJ may know
off-head.

Richard.

References:
- On the x86_64, does one have to zero a vector register before filling it completely ?
  - From: Toon Moene
- Re: On the x86_64, does one have to zero a vector register before filling it completely ?
  - From: H.J. Lu
- Re: On the x86_64, does one have to zero a vector register before filling it completely ?
  - From: Toon Moene
- Re: On the x86_64, does one have to zero a vector register before filling it completely ?
  - From: Tim Prince
- Re: On the x86_64, does one have to zero a vector register before filling it completely ?
  - From: Richard Guenther
- Re: On the x86_64, does one have to zero a vector register before filling it completely ?
  - From: Tim Prince

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]