This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/36127] New: bad choice of loop IVs above -Os on x86
- From: "astrange at ithinksw dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 5 May 2008 02:11:27 -0000
- Subject: [Bug tree-optimization/36127] New: bad choice of loop IVs above -Os on x86
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
> /usr/local/gcc44/bin/gcc -v
[..]
gcc version 4.4.0 20080503 (experimental) (GCC)
> gcc -O3 -mfpmath=sse -fno-pic -fno-tree-vectorize -S himenoBMTxps.c
With -O2/-O3, the inner loop in jacobi() in this program ends containing a lot
of this:
movss _p-4(%edi,%edx,4), %xmm0
movl -96(%ebp), %edi
subss _p-4(%edi,%edx,4), %xmm0
movl -108(%ebp), %edi
subss _p-4(%edi,%edx,4), %xmm0
movl -92(%ebp), %edi
addss _p-4(%edi,%edx,4), %xmm0
movl -124(%ebp), %edi
At -O1 or -Os, it instead produces:
movss 34056(%eax), %xmm0
subss 33024(%eax), %xmm0
subss -33024(%eax), %xmm0
addss -34056(%eax), %xmm0
which is much better. On core 2 it claims to be 40% faster at -Os.
IIRC this isn't a problem on x86-64, but IRA+-O3 was much worse again.
--
Summary: bad choice of loop IVs above -Os on x86
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127