This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: enabling SSE for 3-vector inner product

From: Brian Budge <brian dot budge at gmail dot com>
To: gcc-help at gcc dot gnu dot org
Date: Thu, 29 Apr 2010 12:38:41 -0700
Subject: Re: enabling SSE for 3-vector inner product
References: <4BD9B192.7030505@gmail.com> <20100429171703.GK16241@axel>

Although I haven't tried this kind of thing with the new SSE4+
instructions, with older instruction sets, in general, using SSE ps
instructions in these cases will actually reduce performance.  Even if
you had a float4 type instead of a float3, it's unlikely that you'd
get a speed improvement using structs like this.

SSE, and most other SIMD methodologies work best with a
struct-of-arrays type of format.  The overhead for SSE will simply be
too high to be worth the benefits derived from SSE for a case like the
one presented.  You might have to think at a higher algorithmic level
to make good use of SSE.

  Brian


On Thu, Apr 29, 2010 at 10:17 AM, Axel Freyn <axel-freyn@gmx.de> wrote:
> Hi Qianqian,
>>
> First: I don't know anything about the vectorizer, so be very careful
> with my answer;-)
>> My code looks like this:
>>
>> typedef struct CPU_float3{
>> ? ? float x,y,z;
>> } float3;
>> float vec_dot(float3 *a,float3 *b){
>> ? ? ? ? return a->x*b->x+a->y*b->y+a->z*b->z;
>> }
>> float pinner(float3 *Pd,float3 *Pm,float3 *Ad,float3 *Am){
>> ? ? ? ? return vec_dot(Pd,Am)+vec_dot(Pm,Ad);
>> }
>> ...
>>
>> and then I call pinner() a lot in my main function.
>>
>> Here are my questions:
>>
>> 1. when I compile the above code with gcc -O3 option, will the
>> above vec_dot function be translated to SSE automatically?
> I think: in general not. The vectorizer does only vectorize loops.
> And in addition, you will have to add "-ffast-math" to the compiler, to
> authorize vectorization (I think?). When you compile your code with the
> option "-ftree-vectorizer-verbose=2":
>
> gcc-4.5 -O3 -ffast-math -ftree-vectorizer-verbose=2 ?-c sse.c
>
> it tells you about what the vectorizer is doing: nothing... (I simply
> compiled the two functions vec_dot and pinner from you)
>
> However, if you would write vec_dot as
> float vec_dot(float3 *a,float3 *b){
> ?float dot=0;
> ?int i;
> ?for(i = 0; i < 3; ++i)
> ? ?dot+= ?a->x[i]*b->x[i];
> ?return dot;
> }
> , gcc would vectorize it, however not for a loop with only 3 iterations:
> sse.c:7: note: not vectorized: iteration count too small.
> sse.c:4: note: vectorized 0 loops in function.
>
>
> However, as soon as you call vec_dot and pinner often on adjacent
> elements, it might be that the vectorizer will be used therefor... Just
> try to compile your code with "-ftree-vectorizer-verbose=2" (and maybe
> "-ffast-math", if you can accept that loose of precision / weakening of
> the standard (see man-page))
>>
>> 2. if not, anyone can suggest a SSE instruction
>> to accelerate the above computation?
>>
>> 3. is "inline" a valid option for GCC when compiling a C code?
> Yes, it is. However, as soon as the function is defined in the same
> compilation unit where it is used, gcc with -O3 will automatically
> inline everything (at least: when gcc believes it to be usefull :-))
>
> Axel
>

References:
- enabling SSE for 3-vector inner product
  - From: Qianqian Fang
- Re: enabling SSE for 3-vector inner product
  - From: Axel Freyn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]