This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: auto-vectorization analysis/__builtin_assume_aligned on gcc-4.7-20120114


The generated non-vectorized assembly is simply the unrolled loop with >8 iterations, so loop structure is pretty much intact (except for unrolling).

Does the vectorizer fail on unrolled loops?

I can compile some assembly dumps showing both the vectorized and the unvectorized loop?

Alex

On 01/19/2012 11:29 AM, Richard Guenther wrote:
On Wed, Jan 18, 2012 at 6:37 PM, Alexander Herz<alexander.herz@mytum.de> wrote:
Given this piece of code (gcc-4.7-20120114):

    static void Test(Batch* block,Batch* new_block,const uint32 offs)
    {

        T* __restrict old_values
=(T*)__builtin_assume_aligned(block->items,16);
        T* __restrict new_values
=(T*)__builtin_assume_aligned(new_block->items,16);

        //assert(((uint64)(&block->items)%16)==0); //OK!!
        //assert(((uint64)(&new_block->items)%16)==0);

        for(uint32 c=0;c<(BS<<1);c++) //hopefully compiler applies SIMD here
        {
            new_values[c]=old_values[c]*old_values[c];
        }

}

I would assume that the loop is always vectorized (pointers tagged as
restricted and aligned, loop
over fixed iteration space even a power of 2, so most likely dividable by
4), it is quite similar to vectorization example22
(http://gcc.gnu.org/projects/tree-ssa/vectorization.html#vectorizab).

I run the previously mentioned g++ version with this command line:
-std=c++0x -g -O3 -msse -msse2 -msse3 -msse4.1 -Wall -Wstrict-aliasing=2
-ftree-vectorizer-verbose=2

Looking at the vectorizer output (and at the generated assembly) it looks as
if the loop given above
is indeed vectorized if Test() is called from main() (vectorized 1 loop).

When the function Test() is called nested inside some complex code, it looks
as if the vectorization analysis gives up because the code is too complex to
analyze and never considers the loop inside Test() in this context even
though it should be easily vectorizeable in any context given the hints
inside Test().

Is there anything I can do, so that Test() is analyzed in all contexts? I
guess all methods that contain the
__builtin_assume_aligned hint should be considered for vectorization,
independent of their context.
Without a concrete example it is impossible to say.  I suppose earlier
optimizations destroy loop structure too much?

Thx for your help,
Alex




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]