This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt
- From: "vincenzo.innocente at cern dot ch" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 22 Oct 2012 06:44:17 +0000
- Subject: [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016
Bug #: 55016
Summary: request for specific builtins for rcp and rsqrt
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: vincenzo.innocente@cern.ch
There are cases where the use of approximate rcp and rsqrt suffice.
I wonder if it would be possible to introduce specific "generic" builtins for
"rcp" and "rsqrt" that produce the proper instruction depending on the target
architecture (see,avx etc) and eventually generate vector instruction in a loop
at the moment anything like this is target specific, inefficient and does not
vectorize!
#include <x86intrin.h>
float v0[1024];
float v1[1024];
inline
float rsqrtf( float x ) {
return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) );
}
void v() {
for(int i=0; i!=1024; ++i)
v0[i] = rsqrtf(v1[i]);
}