[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
rguenth at gcc dot gnu dot org
gcc-bugzilla@gcc.gnu.org
Fri Jul 3 11:06:00 GMT 2009
------- Comment #19 from rguenth at gcc dot gnu dot org 2009-07-03 11:05 -------
In fact, in this case we have the C equivalent
int i;
long j = (long)(i - 1);
vs.
long j = (long)i - 1;
which I believe are equivalent if overflow is undefined (or i - 1 does not
wrap).
It is just that fold obviously considers (long)i - 1 to be more expensive
than (long)(i - 1) and thus does not transform the latter into the former
(and it can't transform (long)i - 1 to (long)(i - 1) as if (long)i - 1
does not overflow there is no guarantee that i - 1 does not).
We should be able to do the former transformation during SCEV analysis
though.
I have a patch which results in (-O3 -ffast-math -funroll-loops)
.L6:
mulss (%rcx), %xmm0
movss (%rdx), %xmm5
movss 4(%rdx), %xmm4
addl $4, %ebp
subss %xmm0, %xmm5
movss 8(%rdx), %xmm0
mulss (%rsi), %xmm5
movss %xmm5, (%rdx)
mulss 4(%rcx), %xmm5
subss %xmm5, %xmm4
mulss 4(%rsi), %xmm4
movss %xmm4, 4(%rdx)
movss 8(%rcx), %xmm3
mulss %xmm4, %xmm3
subss %xmm3, %xmm0
mulss 8(%rsi), %xmm0
movss %xmm0, 8(%rdx)
movss 12(%rcx), %xmm2
addq $16, %rcx
mulss %xmm0, %xmm2
movss 12(%rdx), %xmm0
subss %xmm2, %xmm0
mulss 12(%rsi), %xmm0
addq $16, %rsi
movss %xmm0, 12(%rdx)
addq $16, %rdx
cmpl %r8d, %ebp
jne .L6
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2008-04-21 07:11:35 |2009-07-03 11:05:43
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
More information about the Gcc-bugs
mailing list