This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: inlining problems
- From: Thomas Heller <thomas dot heller1 at gmx dot de>
- To: gcc-help at gcc dot gnu dot org
- Date: Mon, 12 Oct 2009 19:55:28 +0200
- Subject: Re: inlining problems
- References: <200910121916.48276.thom.heller@gmail.com> <mcrskdov7q5.fsf@dhcp-172-17-9-151.mtv.corp.google.com>
On Monday 12 October 2009 19:32:18 Ian Lance Taylor wrote:
> Thomas Heller <thomas.heller1@gmx.de> writes:
> > I ran into a little issue when trying to force inlining with
> > __attribute__(( always_inline )). The reason why i am trying to force
> > the compiler to inline my code is simple: I want to implement handwritten
> > optimizations using SSE intrinsics. However it seems that gcc is not
> > willing to inline that code anymore. That is why i came up with the idea
> > of trying to force gcc.
> > There is a problem now. I get tons of error messages that gcc is not able
> > to inline that function.
> > Example error message:
> > sorry, unimplemented: inlining failed in call to âconst
> > pe::Vector3<typename pe::MathTrait<T1, T2, true>::AddType>
> > pe::operator+(const pe::Vector3<Type>&, const pe::Vector3<T2>&) [with T1
> > = double, T2 = double]â: function not inlinable
> >
> > I was trying to put that in a little testcase. However it seems that i
> > can't reproduce that error with a small code base.
> > Any ideas? Need more information?
>
> This question is not appropriate for the gcc@gcc.gnu.org mailing
> list. It would be appropriate for gcc-help@gcc.gnu.org. Please take
> any followups to gcc-help. Thanks.
Sorry for the inconvenience.
> Unfortunate, it's basically impossible for us to say anything useful
> without some sort of test case.
I attached some of my vector classes code. Now, for example operator+ fails to
get inline if compiled with -DUSE_SSE. Unfortunately, This won't happen in
small examples.
But since i don't know what exaclty causes the problem, i can not isolate the
problem.
My code makes excessive use of the inline keyword itself.
> I assume you are using the SSE intrinsics from mmintrin.h and
> friends. Those intrinsics should be reliably inlined. Why is it
> necessary for you to inline them further?
Yes I am using mmintrin.h and friends.
These functions i want to inline are called in an inner loop very often.
The reason why i want to inline them is simple:
On simple testcases (where inlining actually works), my handoptimized
version is around 20% faster than without.
However, when running my real application I get a performance drop
of 20% with my optimized version. Upon investigation i figured out, that
gcc inlined the unoptimzied function calls but not my optimized ones. For
whatever reason.
That is why i think it is necessary to inline further.
> For whatever it's worth, the development version of gcc gives better
> messages about why a function can not be inlined.
Thanks i will try that.
Thomas
template< typename T >
class Vector3
{
public:
Vector3( T x, T y, T z )
{
v_[0] = x;
v_[1] = y;
v_[2] = z;
}
#if USE_SSE
Vector3( __m128d xy, __m128d zw )
{
xy_() = xy;
zw_() = zw;
}
#endif
T& operator[] ( int i )
{
return v_[i];
}
const T& operator[] ( int i ) const
{
return v_[i];
}
private:
#ifndef USE_SSE
T v_[3];
#else
Type v_[4] __attribute__((aligned (16)));
inline __m128d& xy_() { return *reinterpret_cast<__m128d *>( &v_[0] ); }
inline const __m128d& xy_() const { return *reinterpret_cast<__m128d const *>( &v_[0] ); }
inline __m128d& zw_() { return *reinterpret_cast<__m128d *>( &v_[2] ); }
inline const __m128d& zw_() const { return *reinterpret_cast<__m128d const *>( &v_[2] ); }
#endif
template< typename T >
friend const Vector3<T> operator+( const Vector3<T>&, const Vector3<T>& );
}
// this works
template< typename T >
__attribute__(( always_inline ))
const Vector3<T> operator+( const Vector3<T>& v1, const Vector3<T>& v2 )
{
return Vector3<T>( v1[0] + v2[0], v1[1] + v2[1], v1[2] + v2[2] );
}
// functions like these don't work
#if USE_SSE
template<>
__attribute__(( always_inline ))
const Vector3<double> operator+( const Vector3<double>& v1, const Vector3<double>& v2 )
{
return Vector3<double>( _mm_add_pd( lhs.xy_(), rhs.xy_() ),
_mm_add_sd( lhs.zw_(), rhs.zw_() ) );
}
#endif