This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016

             Bug #: 55016
           Summary: request for specific builtins for rcp and rsqrt
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: vincenzo.innocente@cern.ch


There are cases where the use of approximate rcp and rsqrt suffice.

I wonder if it would be possible to introduce specific "generic" builtins for
"rcp" and "rsqrt" that produce the proper instruction depending on the target
architecture (see,avx etc) and eventually generate vector instruction in a loop

at the moment anything like this is target specific, inefficient and does not
vectorize!

#include <x86intrin.h>
float v0[1024];
float v1[1024];
inline
float rsqrtf( float x ) {
  return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) );
}
void v() {
  for(int i=0; i!=1024; ++i)
    v0[i] = rsqrtf(v1[i]);
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]