This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: How to force gcc to vectorize the loop with particular vectorization width

On October 21, 2017 9:50:13 PM GMT+02:00, Denis Bakhvalov <> wrote:
>Hello Richard,
>Thank you. I achieved vectorization with vf = 16, using
>#pragma GCC optimize ("no-unroll-loops")
>__attribute__ ((__target__ ("sse4.2")))
>and options -march=core-avx2 -mprefer-avx-128
>But now I have a question: Is it possible in gcc to have vectorization
>with vf < 16?

No, not at the moment. 


>On 20/10/2017, Richard Biener <> wrote:
>> On Fri, Oct 20, 2017 at 12:13 PM, Denis Bakhvalov
>> wrote:
>>> Thank you for the reply!
>>> Regarding last part of your message, this is also what clang will do
>>> when you are passing vf of 4 (with the pragma from my first message)
>>> for the loop operating on chars plus using SSE2. It will do
>>> work only for 4 chars per iteration (a[0], zero, zero, zero, a[1],
>>> zero, zero, zero, etc.).
>>> Please see example here:
>>> Let's say that I know all possible trip counts for my inner loop.
>>> all do not exceed 32. In the example above vf for this loop is 32.
>>> There is a runtime check, such that if trip count do not exceed 32
>>> will fall back to scalar version.
>>> As long as trip count is always lower that 32 - it always chooses
>>> scalar version at runtime.
>>> But theoretically, using SSE2 for trip count = 8 it can use half of
>>> xmm register (8 chars) to do meaningfull work.
>>> Is gcc vectorizer capable of doing this?
>>> If yes, can I somehow achieve this in gcc by tweaking the code or
>>> adding some pragma?
>> The closest is to use -mprefer-avx128 so you get SSE rather than AVX
>> vector sizes.  Eventually this option is among the valid target
>> for #pragma GCC target
>>> On 19/10/2017, Jakub Jelinek <> wrote:
>>>> On Thu, Oct 19, 2017 at 10:38:28AM +0200, Richard Biener wrote:
>>>>> On Thu, Oct 19, 2017 at 9:22 AM, Denis Bakhvalov
>>>>> wrote:
>>>>> > Hello!
>>>>> >
>>>>> > I have a hot inner loop which was vectorized by gcc, but I also
>>>>> > compiler to unroll this loop by some factor.
>>>>> > It can be controled in clang with this pragma:
>>>>> > #pragma clang loop vectorize(enable) vectorize_width(8)
>>>>> > Please see example here:
>>>>> >
>>>>> >
>>>>> > So I want to tell gcc something like this:
>>>>> > "I want you to vectorize the loop. After that I want you to
>>>>> > this vectorized loop by some defined factor."
>>>>> >
>>>>> > I was playing with #pragma omp simd with the safelen clause, and
>>>>> > #pragma GCC optimize("unroll-loops") with no success. Compiler
>>>>> > -fmax-unroll-times is not suitable for me, because it will
>>>>> > other parts of the code.
>>>>> >
>>>>> > Is it possible to achieve this somehow?
>>>>> No.
>>>> #pragma omp simd has simdlen clause which is a hint on the
>>>> vectorization factor, but the vectorizer doesn't use it so far;
>>>> probably it wouldn't be that hard to at least use that as the
>>>> factor if the target has multiple ones if it is one of those.
>>>> The vectorizer has some support for using wider vectorization
>>>> if there are mixed width types within the same loop, so perhaps
>>>> supporting 2x/4x/8x etc. sizes of the normally chosen width might
>not be
>>>> that hard.
>>>> What we don't have right now is support for using smaller
>>>> vectorization factors, which might be sometimes beneficial for -O2
>>>> vectorization of mixed width type loops.  We always use the vf
>>>> from the smallest width type, say when using SSE2 and there is a
>>>> type,
>>>> we try to use vf of 16 and if there is also int type, do operations
>>>> those
>>>> in 4x as many instructions, while there is also an option to use
>>>> vf of 4 and for operations on char just do something meaningful
>only in
>>>> 1/4
>>>> of vector elements.  The various x86 vector ISAs have instructions
>>>> widen or narrow for conversions.
>>>> In any case, no is the right answer right now, we don't have that
>>>> implemented.
>>>>       Jakub
>>> --
>>> Best regards,
>>> Denis.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]