This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

_mm_malloc()


Hello, thank you for your time. My name is Kevin

I am a student doing micro-benchmark checks of gcc vs icc on autovectorization.
I am using gcc --version
gcc (GCC) 4.4.3 20100127 (Red Hat 4.4.3-4)


The architecture i am testing on is:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Duo CPU     E8300  @ 2.83GHz
stepping        : 6
cpu MHz         : 2833.148
cache size      : 6144 KB
...

I am specifically using an example 1 on the below website.
gcc.gnu.org/projects/tree-ssa/vectorization.html

I have contacted the authors of the website and they recommended I forward my question to you.

My question concerns a vectorization of code from the site and the aligned property of
pointers returned from _mm_malloc.


I will use an example from the above site, modified slightly:

example1:
// _mm_malloc the arrays
int M = 4*1024*1024

if (( A = (float*) _mm_malloc( M*sizeof(float),16) ) == NULL) {

printf("ERROR ALLOCATING mybuffer1\n");

exit(1); }


if (( B = (float*) _mm_malloc( M*sizeof(float),16) ) == NULL) {


printf("ERROR ALLOCATING mybuffer2\n");

fflush(stderr);

exit(1); }


if (( C = (float*) _mm_malloc( M*sizeof(float),16) ) == NULL) {


printf("ERROR ALLOCATING mybuffer3\n");

 exit(1);  }
int i;
for (i=0; i<M; i++)  A[i] = B[i] + C[i];






I will also attach the entire compilable file to this email. So you can compile it if you wish.


My question is,
When I use
gcc -O3 -msse4 -ftree-vectorizer-verbose=6 example1.c


I get (among a lot of other stuff):


example1.c:75: note: Alignment of access forced using peeling.
example1.c:75: note: Vectorizing an unaligned access.
example1.c:75: note: Vectorizing an unaligned access.



It is my impression that I align all arrays using _mm_malloc.
For clarification: line 75 is the for loop above and arrays a, b, and c are allocated with _mm_malloc().


I thought I was guaranteed to have _mm_malloc return array addresses that are aligned to 16 bytes
and that the compiler would recognize this.


Is that right? If so, then why do I get 'vectorizing an unaligned access'?

Thanks for your time :)
Kevin

P.S.
Do you have a good source on the use of information that comes back from gcc vectorizer_verbose?
lincoln> gcc -O3 -msse4 -ftree-vectorizer-verbose=6 example1.c
example1.c:75: note: Alignment of access forced using peeling.
example1.c:75: note: Vectorizing an unaligned access.
example1.c:75: note: Vectorizing an unaligned access.
example1.c:75: note: vect_model_load_cost: unaligned supported by hardware.
example1.c:75: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
example1.c:75: note: vect_model_load_cost: unaligned supported by hardware.
example1.c:75: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
example1.c:75: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
example1.c:75: note: vect_model_store_cost: inside_cost = 1, outside_cost = 0 .
example1.c:75: note: cost model: prologue peel iters set to vf/2.
example1.c:75: note: cost model: epilogue peel iters set to vf/2 because peeling for alignment is unknown .
example1.c:75: note: Cost model analysis:
Vector inside of loop cost: 6
Vector outside of loop cost: 24
Scalar iteration cost: 4
Scalar outside cost: 0
prologue iterations: 2
epilogue iterations: 2
Calculated minimum iters for profitability: 8
example1.c:75: note: Profitability threshold = 7
example1.c:75: note: Vectorization may not be profitable.
example1.c:75: note: LOOP VECTORIZED.
example1.c:67: note: not vectorized: unhandled data-ref
example1.c:17: note: vectorized 1 loops in function.



There are a lot of terms that seem to convey a lot of info that i'm not sure how to use?
inside cost?
outside cost?
prologue peel iters?
epilogye peel iters?
Vector inside of loop cost?
Calculated minimum iters for profitability: 8?


This seems to indicate that if I iterate over my loop for at least 8 times I should see a performance increase.
I iterate over the loop 4 million times, yet the compiler responds with:
"example1.c:75: note: Vectorization may not be profitable."
Just to be precise, line 75 is: for (i=0; i<M; i++) A[i] = B[i] + C[i];



Thanks again if you can spend just a few minutes commenting on my query. Thanks.

Attachment: example1.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]