This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: mmintrin slower than inline asm or even plain C

From: "Richard Guenther" <richard dot guenther at gmail dot com>
To: "Jack Andrews" <effbiae at gmail dot com>
Cc: hjl at lucon dot org, mark at codesourcery dot com, rth at redhat dot com, gcc-help at gcc dot gnu dot org
Date: Sun, 22 Apr 2007 12:06:18 +0200
Subject: Re: mmintrin slower than inline asm or even plain C
References: <2cf50a010704212049j35998195w86b9238d013962bc@mail.gmail.com>

On 4/22/07, Jack Andrews <effbiae@gmail.com> wrote:

hi guys,

i write to you direct because i can't find the relevant mailing list
for help with the mmintrin functions.  there's a thread at:

http://gcc.gnu.org/ml/gcc-help/2007-04/msg00201.html

that details my problems.

i want to sum an array of longs using mmx.  i use the functions:
    _mm_set_pi32 and _m_paddd
but the resultant binary contains significantly less efficient code
than inline asm or even plain C ( for(i=0;i<n;i++)total+=a[i]; ).
here's the relevant function:

simd_mmintrin(n, is)
I *is;
{   __m64 q,r;
   I i;
   _m_empty();
   q=_m_from_int(0);
   for (i=0; i < n; i+=W) {
       r=_mm_set_pi32(is[i],is[i+1]);
       q=_m_paddd(q,r);
   }
   union {long a[2];__m64 m;}u;
   u.m=q;
   return u.a[0]+u.a[1];
}

and the rest of the code and a shell script to run it is in the thread above.


You should do a bugreport.  I suspect that we cannot combine
_mm_set_pi32(is[i],is[i+1]) to a movq as you do in the asm and that
we have non-optimal register allocation.

Richard.

References:
- mmintrin slower than inline asm or even plain C
  - From: Jack Andrews

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]