Bug 48510 - Does not vectorize loops involving casts from floating point to unsigned integer types
Summary: Does not vectorize loops involving casts from floating point to unsigned inte...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.2
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2011-04-08 07:00 UTC by jeremysalwen
Modified: 2021-08-24 23:06 UTC (History)
0 users

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-08-24 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jeremysalwen 2011-04-08 07:00:43 UTC
The following code vectorizes with the command line options:

-march=native -mtune=native -ftree-vectorizer-verbose=12 -O3 -std=c99 -ffast-math -funsafe-math-optimizations -lm  main.c

#include <stdio.h>
#include <math.h>
int main() {
  double g[1000];
  for(int i=0; i<1000; i++) {
    g[i]=2*(g[i]);
  }
  for(int i=0; i<1000; i++) {
   printf("%f\n",g[i]);
  }
}

but the following code does not with the same options:


#include <stdio.h>
#include <math.h>
int main() {
  double g[1000];
  for(int i=0; i<1000; i++) {
    g[i]=2*((unsigned long)g[i]);
  }
  for(int i=0; i<1000; i++) {
   printf("%f\n",g[i]);
  }
}

If I understand correctly, there are SSE instructions for casting doubles to long integers on the platform I'm on (Intel Atom) which GCC could use.  (or perhaps there could be a benefit to vectorizing other parts of the loop, even if the cast does not utilize SIMD instructions.)
Comment 1 Richard Biener 2011-04-08 10:02:15 UTC
There are no conversions to unsigned.  For scalar code we work around this
by biasing the input/output, but there is no vectorized version for this
available (yet).
Comment 2 jeremysalwen 2011-04-08 17:43:25 UTC
As you mentioned, unsigned conversions to int and long int both fail to vectorize.  However, signed conversions to long int still fail, while they succeed for signed conversions to int.

I should note that the computer I ran these last tests on is a Phenom II.

Apologies if I'm mistaken, but I believe the intrinsic

__builtin_ia32_cvttsd2si64

should do the conversion to a signed long on my platform.
Comment 3 Richard Biener 2012-07-13 08:42:23 UTC
Link to vectorizer missed-optimization meta-bug.
Comment 4 Richard Biener 2012-07-19 10:57:07 UTC
We can now vectorize the conversion to unsigned int, it is not possible to
directly convert from double to unsigned long as far as I can see - only
cvttpd2dq exists which is a signed conversion.  Biasing the input value
from [0, ULONG_MAX] to [-LONG_MIN, LONG_MAX] is not possible because of
the different truncation behavior for signed/unsigned values.

As a side-note, C specifies that float -> unsigned integer truncation
only has defined behavior for inputs in the range (-1, Utype_MAX+1).

ICC seems to use some clever range-dependent operations, finally mixing
three cases together with bitwise operations.