This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/49483] New: unable to vectorize code equivalent to "scalbnf"
- From: "vincenzo.innocente at cern dot ch" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 21 Jun 2011 09:04:31 +0000
- Subject: [Bug tree-optimization/49483] New: unable to vectorize code equivalent to "scalbnf"
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49483
Summary: unable to vectorize code equivalent to "scalbnf"
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: major
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: vincenzo.innocente@cern.ch
I'm trying to write simplified versions of trigonometric and trascendental
functions that gcc can auto-vectorize.
at the moment I'm blocked with the vectorization of "scalbnf"
I'm using code equivalent to the one in glibc
sysdeps/ieee754/flt-32/s_scalbnf.c
and
math/math_private.h
which in my c++ version reads
cat vldexpf.cc
inline float i2f(int x) {
union { float f; int i; } tmp;
tmp.i=x;
return tmp.f;
}
inline float vect_ldexpf(float x, int n) {
n = (n+0x7f)<<23;
return x * i2f(n);
}
float __attribute__ ((aligned(16))) a[1024];
float __attribute__ ((aligned(16))) b[1024];
float __attribute__ ((aligned(16))) c[1024];
void tV() {
for (int i=0; i!=1024; ++i) {
float z = a[i];
int n = b[i];
c[i] = vect_ldexpf(z,n);
}
}
compiling it produces
c++ -Ofast -c vldexpf.cc -msse4.2 -ftree-vectorizer-verbose=7
vldexpf.cc:16: note: vect_model_load_cost: aligned.
vldexpf.cc:16: note: vect_get_data_access_cost: inside_cost = 1, outside_cost =
0.
vldexpf.cc:16: note: vect_model_load_cost: aligned.
vldexpf.cc:16: note: vect_get_data_access_cost: inside_cost = 2, outside_cost =
0.
vldexpf.cc:16: note: vect_model_store_cost: aligned.
vldexpf.cc:16: note: vect_get_data_access_cost: inside_cost = 3, outside_cost =
0.
vldexpf.cc:16: note: vect_model_load_cost: aligned.
vldexpf.cc:16: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
vldexpf.cc:16: note: vect_model_load_cost: aligned.
vldexpf.cc:16: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
vldexpf.cc:16: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1
.
vldexpf.cc:16: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1
.
vldexpf.cc:16: note: not vectorized: relevant stmt not supported: D.2243_14 =
VIEW_CONVERT_EXPR<float>(n_13);
vldexpf.cc:15: note: vectorized 0 loops in function.
I'm using
c++ -v
Using built-in specs.
COLLECT_GCC=c++
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin10.7.0/4.7.0/lto-wrapper
Target: x86_64-apple-darwin10.7.0
Configured with: ./configure --enable-languages=c,c++,fortran --enable-lto
--with-build-config=bootstrap-lto CFLAGS='-O2 -ftree-vectorize -fPIC'
CXXFLAGS='-O2 -fPIC -ftree-vectorize -fvisibility-inlines-hidden'
Thread model: posix
gcc version 4.7.0 20110528 (experimental) (GCC)