This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c++/35117] Vectorization on power PC



------- Comment #24 from victork at gcc dot gnu dot org  2008-02-11 12:23 -------
Hi,

Here are some more of my observations.
1. For some unclear reason there is indeed no much difference between
vectorized and non-vectorized versions for long runs like "time ./TestNoVec
92200 8 89720 1000", but the difference is much more apparent for more short
runs:

victork@white:~> time ./mnovec 30000 8 29720 1000
real    0m1.738s
user    0m1.723s
sys     0m0.004s

victork@white:~> time ./mvec 30000 8 29720 1000
real    0m0.781s
user    0m0.778s
sys     0m0.003s

2. If you replace the new() by malloc() it helps to static dependence analysis
to prove independence between pSum, pSum1 and pVec1 at compile time, so the
run-time versioning is not required.

3. If we leave allocation of buffers by new(), then compiler uses "versioning
for alias" and this forces the use of versioning for alignment used to prove
right alignment of store to pVec1. This is less optimal than loop peeling,
since the vectorized version of loop is executed only for values of itBegin
which is multiple of 4. 

Here is the vesion of your program I used to get above results:

#include <iostream>
#include <stdio.h>
#include <stdlib.h>

typedef float ARRTYPE;
int main ( int argc, char *argv[] )
{
        int m_nSamples = atoi( argv[1] );
        int itBegin = atoi( argv[2] );
        int itEnd = atoi( argv[3] );
        int iSizeMain = atoi( argv[ 4 ] );
        ARRTYPE *pSum1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000);
        ARRTYPE *pSum = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000);
        for ( int it = 0; it < m_nSamples; it++ )
        {
                pSum[ it ] = it / itBegin;
                pSum1[ it ] = itBegin / ( it + 1 );
        }
        ARRTYPE *pVec1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);

        for ( int i = 0, j = 0; i < m_nSamples - 5; i++ )
        {
            for( int it = itBegin; it < itEnd; it++ )
                pVec1[ it ] += pSum[ it ] + pSum1[ it ];
        }
        free( pVec1 );
}

victork@white:~> $g -O3 -fno-tree-vectorize -m64 -o mnovec m.c
victork@white:~> $g -O3 -fdump-tree-vect-details -ftree-vectorize -maltivec
-m64 -o mvec m.c
victork@white:~> time ./mnovec 30000 1 29720 1000

real    0m1.754s
user    0m1.750s
sys     0m0.003s
victork@white:~> time ./mvec 30000 1 29720 1000

real    0m0.781s
user    0m0.778s
sys     0m0.003s


-- Victor


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]