This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
On Sat, Nov 28, 2009 at 4:26 PM, Tim Prince <n8tm@aol.com> wrote:Toon Moene wrote:H.J. Lu wrote:If you want those, you must request them with -mtune=barcelona.On Sat, Nov 28, 2009 at 3:21 AM, Toon Moene <toon@moene.org> wrote:L.S.,I think it is used to avoid partial SSE register stall.
Due to the discussion on register allocation, I went back to a hobby of mine: Studying the assembly output of the compiler.
For this Fortran subroutine (note: unless otherwise told to the Fortran front end, reals are 32 bit floating point numbers):
subroutine sum(a, b, c, n) integer i, n real a(n), b(n), c(n) do i = 1, n c(i) = a(i) + b(i) enddo end
with -O3 -S (GCC: (GNU) 4.5.0 20091123), I get this (vectorized) loop:
xorps %xmm2, %xmm2 .... .L6: movaps %xmm2, %xmm0 movaps %xmm2, %xmm1 movlps (%r9,%rax), %xmm0 movlps (%r8,%rax), %xmm1 movhps 8(%r9,%rax), %xmm0 movhps 8(%r8,%rax), %xmm1 incl %ecx addps %xmm1, %xmm0 movaps %xmm0, 0(%rbp,%rax) addq $16, %rax cmpl %ebx, %ecx jb .L6
I'm not a master of x86_64 assembly, but this strongly looks like %xmm{0,1} have to be zero'd (%xmm2 is set to zero by xor'ing it with itself), before they are completely filled with the mov{l,h}ps instructions ?
You mean there's no movaps (%r9,%rax), %xmm0 (and mutatis mutandis for %xmm1) instruction (to copy 4*32 bits to the register) ?
Which would then get you movups (%r9,%rax), %xmm0 (unaligned move). generic tuning prefers the split moves, AMD Fam10 and above handle unaligned moves just fine.
Correct, the movaps would have been used if alignment were recognized. The newer CPUs achieve full performance with movups. Do you consider Core i7/Nehalem as included in "AMD Fam10 and above?"
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |