This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: vectorizer question
- From: "Richard Guenther" <richard dot guenther at gmail dot com>
- To: "VandeVondele Joost" <vondele at pci dot uzh dot ch>
- Cc: gcc at gcc dot gnu dot org
- Date: Mon, 18 Aug 2008 17:14:33 +0200
- Subject: Re: vectorizer question
- References: <Pine.A41.4.63.0808181633390.753886@idaix01.unizh.ch>
2008/8/18 VandeVondele Joost <vondele@pci.uzh.ch>:
>
> The attached testcase yields (on a core2 duo, gcc trunk):
>
>> gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90
>> time ./a.out
>
> real 0m3.414s
>
>> ifort -xT -O3 test.f90
>> time ./a.out
>
> real 0m1.556s
>
> The assembly contains:
>
> ifort gfortran
> mulpd 140 0
> mulsd 0 280
>
> so the reason seems that ifort vectorizes the following code (full testcase
> attached):
>
> SUBROUTINE collocate_core_6(res,coef_xyz,pol_x,pol_y,pol_z,cmax,kg,jg)
>
> IMPLICIT NONE
> INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
> integer, PARAMETER :: lp=6
> real(wp), INTENT(OUT) :: res
> integer, INTENT(IN) :: cmax,kg,jg
> real(wp), INTENT(IN) :: pol_x(0:lp,-cmax:cmax)
> real(wp), INTENT(IN) :: pol_y(1:2,0:lp,-cmax:0)
> real(wp), INTENT(IN) :: pol_z(1:2,0:lp,-cmax:0)
> real(wp), INTENT(IN) :: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
> real(wp) :: coef_xy(2,(lp+1)*(lp+2)/2)
> real(wp) :: coef_x(4,0:lp)
>
> [...]
> coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,12)*pol_y(1,1,jg)
> coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,12)*pol_y(2,1,jg)
> coef_x(1:2,5)=coef_x(1:2,5)+coef_xy(1:2,13)*pol_y(1,1,jg)
> coef_x(3:4,5)=coef_x(3:4,5)+coef_xy(1:2,13)*pol_y(2,1,jg)
> coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,14)*pol_y(1,2,jg)
> coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,14)*pol_y(2,2,jg)
> coef_x(1:2,1)=coef_x(1:2,1)+coef_xy(1:2,15)*pol_y(1,2,jg)
> coef_x(3:4,1)=coef_x(3:4,1)+coef_xy(1:2,15)*pol_y(2,2,jg)
> coef_x(1:2,2)=coef_x(1:2,2)+coef_xy(1:2,16)*pol_y(1,2,jg)
> coef_x(3:4,2)=coef_x(3:4,2)+coef_xy(1:2,16)*pol_y(2,2,jg)
> coef_x(1:2,3)=coef_x(1:2,3)+coef_xy(1:2,17)*pol_y(1,2,jg)
> coef_x(3:4,3)=coef_x(3:4,3)+coef_xy(1:2,17)*pol_y(2,2,jg)
> coef_x(1:2,4)=coef_x(1:2,4)+coef_xy(1:2,18)*pol_y(1,2,jg)
> coef_x(3:4,4)=coef_x(3:4,4)+coef_xy(1:2,18)*pol_y(2,2,jg)
> coef_x(1:2,0)=coef_x(1:2,0)+coef_xy(1:2,19)*pol_y(1,3,jg)
> coef_x(3:4,0)=coef_x(3:4,0)+coef_xy(1:2,19)*pol_y(2,3,jg)
> [...]
>
> either it is able to interpret the short vectors as such, or it realizes
> that these very short implicit loops are nevertheless favourable for
> vectorization.
>
> Is there a trick to get gcc vectorize these loops, or is there some
> technology missing for this ?
>
> Should I file a PR for this (this is somewhat similar to PR31079 and
> PR31021)?
It would be nice to have a stand-alone testcase for this, so please
file a bugreport.
Thanks,
Richard.