This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop

From: "eyalroz at technion dot ac.il" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Sat, 20 Jan 2018 10:00:49 +0000
Subject: [Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop
Auto-submitted: auto-generated

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951

            Bug ID: 83951
           Summary: [missed optimization] difference calculation for
                    floats vs ints in a loop
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Consider the following code:

template <typename T>
int foo(T* __restrict__ a)
{
    int i; T val = 0;
    for (i = 0; i < 100; i++) {
        val = 2 * i;
        a[i] = val;
    }
}

template int foo<int>(int* __restrict__ a);
template int foo<float>(float* __restrict__ a);

(This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but
the use of C++ here is immaterial).

The int version compiles, with -O2, into:

foo(int*):
        xor     eax, eax
.L2:
        mov     DWORD PTR [rdi], eax
        add     eax, 2
        add     rdi, 4
        cmp     eax, 200
        jne     .L2
        rep ret

One would expect that the float version would compile into something similar,
except that instead of rdi we would have a floating-point register, initialized
to 0 and incremented by float 2.0 with each iteration. Instead, we get:

int foo<float>(float*):
        xor     eax, eax
.L6:
        pxor    xmm0, xmm0
        add     rdi, 4
        cvtsi2ss        xmm0, eax
        add     eax, 2
        movss   DWORD PTR [rdi-4], xmm0
        cmp     eax, 200
        jne     .L6
        rep ret

which seems to be much slower.

Checked here: https://godbolt.org/g/RVBNyY

Follow-Ups:
- [Bug other/83951] [missed optimization] difference calculation for floats vs ints in a loop
  - From: eyalroz at technion dot ac.il
- [Bug rtl-optimization/83951] [missed optimization] difference calculation for floats vs ints in a loop
  - From: eyalroz at technion dot ac.il
- [Bug rtl-optimization/83951] [missed optimization] difference calculation for floats vs ints in a loop
  - From: pinskia at gcc dot gnu.org
- [Bug rtl-optimization/83951] [missed optimization] difference calculation for floats vs ints in a loop
  - From: glisse at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]