This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Induction variable elimination, was: Re: On the x86_64, does one have to zero a vector register before filling it completely ?
- From: Toon Moene <toon at moene dot org>
- To: gcc mailing list <gcc at gcc dot gnu dot org>
- Date: Sun, 29 Nov 2009 15:01:47 +0100
- Subject: Induction variable elimination, was: Re: On the x86_64, does one have to zero a vector register before filling it completely ?
- References: <4B1107A6.6010205@moene.org> <4B113B18.5070304@aol.com> <4B11740B.2020702@moene.org> <4B1177E7.3090906@moene.org>
Toon Moene wrote:
I wrote:
OK, so it is an alignment issue (with -mtune=barcelona):
.L6:
movups 0(%rbp,%rax), %xmm0
movups (%rbx,%rax), %xmm1
incl %ecx
addps %xmm1, %xmm0
movaps %xmm0, (%r8,%rax)
addq $16, %rax
cmpl %r10d, %ecx
jb .L6
Once this problem is solved (well, determined how it could be solved),
we go on to the next, the extraneous induction variable %ecx.
There are two ways to deal with it:
1. Eliminate it with respect to the other induction variable that
counts in the same direction (upwards, with steps 16) and remember
that induction variable's (%rax) limit.
Just for completeness - gcc *does* know how to do this; it just doesn't
work when vectorizing.
This is what I get when compiling with -O2 -S:
.L3:
movss (%rdi,%rax), %xmm0
addss (%rsi,%rax), %xmm0
movss %xmm0, (%rdx,%rax)
addq $4, %rax
cmpq %rcx, %rax
jne .L3
Note how %rax remains as sole induction variable.
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html