This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/22497] New: A register is wasted in simple vectorised loops
- From: "uros at kss-loka dot si" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 15 Jul 2005 10:01:31 -0000
- Subject: [Bug target/22497] New: A register is wasted in simple vectorised loops
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
Hello!
Consider this simple testcase:
#define N 16
short ia[N];
short ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
short ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
int main ()
{
int i;
for (i = 0; i < N; i++)
ia[i] = ib[i] + ic[i];
return 0;
}
The loop in this testcase is compiled with 'gcc -O2 -ftree-vectorize -msse2'
into:
.L2:
movdqa ib(%eax), %xmm0
paddw ic(%eax), %xmm0
incl %edx
movdqa %xmm0, ia(%eax)
addl $16, %eax
cmpl $2, %edx
jne .L2
There is no (,%reg,16) SIB mode available in i386, and it looks to me that loop
optimizer fallbacks to the most simple addressing mode in this case.
Unfortunatelly, %edx register is wasted in above code.
A better code would be:
.L2:
movdqa ib(,%eax,8), %xmm0
paddw ic(,%eax,8), %xmm0
movdqa %xmm0, ia(,%eax,8)
addl $2, %eax
cmpl $4, %eax
jne .L2
or with the simplest addressing scheme:
.L2:
movdqa ib(%eax), %xmm0
paddw ic(%eax), %xmm0
movdqa %xmm0, ia(%eax)
addl $16, %eax
cmpl $32, %eax
jne .L2
Uros.
--
Summary: A register is wasted in simple vectorised loops
Product: gcc
Version: 4.1.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P2
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22497