GCC vectorization problem on X86

Robert Bernecky bernecky@snakeisland.com
Sun Jun 6 16:00:00 GMT 2010

Thanks. Ira.

That gives me a hint as to what's going on.
(I could swear that the failing example vectorized
in an earlier life, even though it was reading
the iteration count from a file. Unfortunately, I have
no proof of this!)

I'll look into the code you cited and see what
guidance I can find there.

There is, of course, a substantial difference in performance
between vectorized and non-vectorized codes on large
arrays, so I am keen to see what we can do here, in
the cases where we do not know iteration counts statically.

I am guessing ( I just received your message, so have not
read the code you recommend...) that the requirement for
static knowledge of iteration count is based on a need to
peel the first/last iterations from the loop, so that
the remaining iterations fill the vector registers

The problem I have on code generation is very similar to yours:
I have array expressions described as loops, yet some of those
expressions are over array shape vectors, so may only be
a few  (say 1-4) elements, and vectorization is a net loss.

Perhaps #pragma directives to the compiler could help here,
at least in some cases.

At any rate, you have pointed me the direction where I should
be able to find an answer, or at least ask a more precise
question or two.

Thanks again,

Ira Rosen wrote:
> gcc-help-owner@gcc.gnu.org wrote on 03/06/2010 09:37:01 PM:
>> Hi. I'm having a problem with GCC vectorization on an Opteron 165.
>> I have two codes, which are, unfortunately, machine-generated
>> and large, which differ, as far as I tell, only in the source
>> of the loop size, N, for a loop roughly of this form:
>>   for( i=0; i<N; i++) {
>>     vec[i] = i;
>>    }
>> In both cases, N comes from another function and is theoretically
>> not inlined. In the first case, N is generated by an identity
>> function that hides its value; this case vectorizes nicely,
>> if the presence of punpckldq instructions is suitable evidence.
>> (papiex confirms vectorization with high PAPI_VEC_INS counts.)
>> In the other case, N comes from a sscanf, and is very well hidden,
>> since it comes from the command line, ultimately. This case
>> does not vectorize, at present. It did vectorize some months ago...
>> This is on:  gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)
>> Neither the compiler nor the OS have changed in that time; the
>> code going into gcc has, of course, changed as the sac2c compiler
>> has evolved.
>> So, are there some subtle (or not subtle...) criteria that gcc has
>> for deciding when to emit vector ops, based on array size, perhaps?
>> Alternately, if someone can point me at the relevant gcc source code,
>> maybe I can get an idea as to what's going on. Or, if there is
>> a bugzilla site for it, I'll take a look there.
> Auto-vectorization can fail if number of iterations can't be computed. The
> vectorizer calls number_of_exit_cond_executions() in tree-
> scalar-evolution.c to determine loop bound.
> HTH,
> Ira
>> Thanks,
>> Robert

More information about the Gcc-help mailing list