This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: "vincenzo.innocente at cern dot ch" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 22 Jul 2013 14:48:50 +0000
- Subject: [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
Bug ID: 57954
Summary: AVX missing vxorps (zeroing) before vcvtsi2s %edx,
slow down AVX code
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following benchmark performances w/o vectorization are poor wrt to
expectations
I find out this is due to non zeroing a register before using it
c++ -O2 -S polyAVX.cpp -mavx
as -v --64 -o polyAVX.o polyAVX.s
GNU assembler version 2.23.1 (x86_64-redhat-linux-gnu) using BFD version (GNU
Binutils) 2.23.1
c++ -O2 polyAVX.o -march=corei7-avx ; time ./a.out
53896530759
15.418u 0.000s 0:15.43 99.8% 0+0k 0+0io 1pf+0w
patch polyAVX.s
49a50
> vxorps %xmm0,%xmm0,%xmm0
patching file polyAVX.s
as -v --64 -o polyAVX.o polyAVX.s
GNU assembler version 2.23.1 (x86_64-redhat-linux-gnu) using BFD version (GNU
Binutils) 2.23.1
c++ -O2 polyAVX.o -march=corei7-avx ; time ./a.out
10340756863
2.958u 0.000s 0:02.96 99.6% 0+0k 0+0io 1pf+0w
I am sure there are many other cases like this.
gcc version 4.9.0 20130718 (experimental) [trunk revision 201034] (GCC)
cat polyAVX.cpp
//template<typename T>
typedef float T;
inline T polyHorner(T y) {
return T(0x2.p0) + y * (T(0x2.p0) + y * (T(0x1.p0) + y * (T(0x5.55523p-4) +
y * (T(0x1.5554dcp-4) + y * (T(0x4.48f41p-8) + y * T(0xb.6ad4p-12)))))) ;
}
#include <x86intrin.h>
#include<iostream>
volatile unsigned long long rdtsc() {
unsigned int taux=0;
return __rdtscp(&taux);
}
int main() {
long long t=0;
bool ret=true;
float s =0;
for (int k=0; k!=100; ++k) {
float c = 1.f/10000000.f;
t -=rdtsc();
for (int i=1; i<10000001; ++i) s+= polyHorner((float(i)+float(k))*c);
t +=rdtsc();
}
ret &= s!=0;
std::cout << t <<std::endl;
return ret ? 0 : -1;
}
- Follow-Ups:
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: hjl.tools at gmail dot com
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: hjl.tools at gmail dot com
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: dushistov at mail dot ru
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: hjl.tools at gmail dot com
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: vincenzo.innocente at cern dot ch
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: ubizjak at gmail dot com
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: ubizjak at gmail dot com
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: vincenzo.innocente at cern dot ch
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: ysrumyan at gmail dot com
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: ubizjak at gmail dot com
- [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
- From: dushistov at mail dot ru