This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

vectorizer question



The attached testcase yields (on a core2 duo, gcc trunk):


gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90
time ./a.out
real 0m3.414s

ifort -xT -O3  test.f90
time ./a.out
real 0m1.556s

The assembly contains:

        ifort   gfortran
mulpd     140          0
mulsd       0        280

so the reason seems that ifort vectorizes the following code (full testcase attached):

SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg)

 IMPLICIT NONE
 INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
 integer, PARAMETER :: lp=6
    real(wp), INTENT(OUT)    :: res
    integer, INTENT(IN)     :: cmax,kg,jg
    real(wp), INTENT(IN)    :: pol_x(0:lp,-cmax:cmax)
    real(wp), INTENT(IN)    :: pol_y(1:2,0:lp,-cmax:0)
    real(wp), INTENT(IN)    :: pol_z(1:2,0:lp,-cmax:0)
    real(wp), INTENT(IN)    :: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
    real(wp) ::  coef_xy(2,(lp+1)*(lp+2)/2)
    real(wp) ::  coef_x(4,0:lp)

[...]
    coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,12)*pol_y(1,1,jg)
    coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,12)*pol_y(2,1,jg)
    coef_x(1:2,5)=coef_x(1:2,5)+coef_xy(1:2,13)*pol_y(1,1,jg)
    coef_x(3:4,5)=coef_x(3:4,5)+coef_xy(1:2,13)*pol_y(2,1,jg)
    coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,14)*pol_y(1,2,jg)
    coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,14)*pol_y(2,2,jg)
    coef_x(1:2,1)=coef_x(1:2,1)+coef_xy(1:2,15)*pol_y(1,2,jg)
    coef_x(3:4,1)=coef_x(3:4,1)+coef_xy(1:2,15)*pol_y(2,2,jg)
    coef_x(1:2,2)=coef_x(1:2,2)+coef_xy(1:2,16)*pol_y(1,2,jg)
    coef_x(3:4,2)=coef_x(3:4,2)+coef_xy(1:2,16)*pol_y(2,2,jg)
    coef_x(1:2,3)=coef_x(1:2,3)+coef_xy(1:2,17)*pol_y(1,2,jg)
    coef_x(3:4,3)=coef_x(3:4,3)+coef_xy(1:2,17)*pol_y(2,2,jg)
    coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,18)*pol_y(1,2,jg)
    coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,18)*pol_y(2,2,jg)
    coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,19)*pol_y(1,3,jg)
    coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,19)*pol_y(2,3,jg)
[...]

either it is able to interpret the short vectors as such, or it realizes that these very short implicit loops are nevertheless favourable for vectorization.

Is there a trick to get gcc vectorize these loops, or is there some technology missing for this ?

Should I file a PR for this (this is somewhat similar to PR31079 and PR31021)?

Thanks in advance,

Joost

Attachment: test.f90
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]