This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop
- From: "eyalroz at technion dot ac.il" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 20 Jan 2018 10:00:49 +0000
- Subject: [Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951
Bug ID: 83951
Summary: [missed optimization] difference calculation for
floats vs ints in a loop
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: eyalroz at technion dot ac.il
Target Milestone: ---
Consider the following code:
template <typename T>
int foo(T* __restrict__ a)
{
int i; T val = 0;
for (i = 0; i < 100; i++) {
val = 2 * i;
a[i] = val;
}
}
template int foo<int>(int* __restrict__ a);
template int foo<float>(float* __restrict__ a);
(This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but
the use of C++ here is immaterial).
The int version compiles, with -O2, into:
foo(int*):
xor eax, eax
.L2:
mov DWORD PTR [rdi], eax
add eax, 2
add rdi, 4
cmp eax, 200
jne .L2
rep ret
One would expect that the float version would compile into something similar,
except that instead of rdi we would have a floating-point register, initialized
to 0 and incremented by float 2.0 with each iteration. Instead, we get:
int foo<float>(float*):
xor eax, eax
.L6:
pxor xmm0, xmm0
add rdi, 4
cvtsi2ss xmm0, eax
add eax, 2
movss DWORD PTR [rdi-4], xmm0
cmp eax, 200
jne .L6
rep ret
which seems to be much slower.
Checked here: https://godbolt.org/g/RVBNyY