GCC vectorization problem on X86

Robert Bernecky bernecky@snakeisland.com
Sun Jun 6 18:15:00 GMT 2010

Hi, Ira.

I wrote this code to exercise vectorization, or lack thereof;
it operates precisely as you described:

#include <stdlib.h>
#include <stdio.h>

int main( int argc, char *argv[])
   int N;
   int *vec;
   int i;

    N = 103;
#else // VECTORIZE
    sscanf("103", "%d", &N);
#endif // VECTORIZE
    printf( "N is: %d\n", N);
    vec = (int*) malloc( sizeof(int) * N);

    for( i=0; i<N; i++) {
      vec[i] = i;
    for( i=0; i<10; i++) {
      printf( "%d,", vec[i]);
    free( vec);


If I compile with "gcc -O3 vectorize.c -DVECTORIZE",
it vectorizes nicely, as long as N>7 (on my Opteron/Ubuntu system).
If I compile with "gcc -O3 vectorize.c",
no vectorization takes place, as you noted.

I see there is a "#pragma novector"; my naive wish here is for a
"#pragma vector", which would strongly encourage vectorization,
even in the absence of known iteration count. That would require,
presumably, a loop-peeling loop, followed by
a possibly-zero-iteration vector loop.

In my case, the generated C code has already had small arrays
(when we know array sizes statically) unrolled or eliminated in
other ways, so most of the remaining FOR-loops would benefit from
vectorization, even in the absence of iteration count.
I could manually strip-mine these loops to get a fixed iteration
count to enable vectorization,
but gcc should be doing that job for me, IMO.

If someone could suggest a nice way to get such a pragma, or to
otherwise encourage the compiler to lean more in the vectorization
direction, I'm all ears. I'd even undertake to write the
pragma, if it's not a huge effort. (I know close to zilch about
gcc internals...)

Rationale: There are many problems where it is simply not
possible to know array sizes statically. For example, analysis of
data base queries ("What was the mean number of shares of IBM
traded, per share on the NYSE today?"). Data mining problems
also fall into this category.


Ira Rosen wrote:
> gcc-help-owner@gcc.gnu.org wrote on 03/06/2010 09:37:01 PM:
>> Hi. I'm having a problem with GCC vectorization on an Opteron 165.
>> I have two codes, which are, unfortunately, machine-generated
>> and large, which differ, as far as I tell, only in the source
>> of the loop size, N, for a loop roughly of this form:
>>   for( i=0; i<N; i++) {
>>     vec[i] = i;
>>    }
>> In both cases, N comes from another function and is theoretically
>> not inlined. In the first case, N is generated by an identity
>> function that hides its value; this case vectorizes nicely,
>> if the presence of punpckldq instructions is suitable evidence.
>> (papiex confirms vectorization with high PAPI_VEC_INS counts.)
>> In the other case, N comes from a sscanf, and is very well hidden,
>> since it comes from the command line, ultimately. This case
>> does not vectorize, at present. It did vectorize some months ago...
>> This is on:  gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)
>> Neither the compiler nor the OS have changed in that time; the
>> code going into gcc has, of course, changed as the sac2c compiler
>> has evolved.
>> So, are there some subtle (or not subtle...) criteria that gcc has
>> for deciding when to emit vector ops, based on array size, perhaps?
>> Alternately, if someone can point me at the relevant gcc source code,
>> maybe I can get an idea as to what's going on. Or, if there is
>> a bugzilla site for it, I'll take a look there.
> Auto-vectorization can fail if number of iterations can't be computed. The
> vectorizer calls number_of_exit_cond_executions() in tree-
> scalar-evolution.c to determine loop bound.
> HTH,
> Ira
>> Thanks,
>> Robert

