enabling SSE for 3-vector inner product
Qianqian Fang
fangqq@gmail.com
Fri Apr 30 10:31:00 GMT 2010
hi list
I am working on a computing code and realized that
a simple inner product of float triplets is taking
30% of my run time when compiling with GCC -O3.
I want to explore options to further accelerate
this code and came up with a couple of questions
concerning using SSE in GCC.
My code looks like this:
typedef struct CPU_float3{
float x,y,z;
} float3;
...
float vec_dot(float3 *a,float3 *b){
return a->x*b->x+a->y*b->y+a->z*b->z;
}
float pinner(float3 *Pd,float3 *Pm,float3 *Ad,float3 *Am){
return vec_dot(Pd,Am)+vec_dot(Pm,Ad);
}
...
and then I call pinner() a lot in my main function.
Here are my questions:
1. when I compile the above code with gcc -O3 option, will the
above vec_dot function be translated to SSE automatically?
2. if not, anyone can suggest a SSE instruction
to accelerate the above computation?
3. is "inline" a valid option for GCC when compiling a C code?
any suggestions for improving the efficiency is
highly appreciated.
thanks
Qianqian
More information about the Gcc-help
mailing list