The following loop showing up in the top time users in capacita.f90 is not vectorized because the loop latch block is non empty: ./capacita.f90:51: note: ===== analyze_loop_nest ===== ./capacita.f90:51: note: === vect_analyze_loop_form === ./capacita.f90:51: note: not vectorized: unexpected loop form. ./capacita.f90:51: note: bad loop form. ./capacita.f90:9: note: vectorized 0 loops in function. This block contains the following code that comes from the partial redundancy elimination pass: bb_14 (preds = {bb_13 }, succs = {bb_13 }) { <bb 14>: # VUSE <SFT.109_593> { SFT.109 } pretmp.166_821 = g.dim[1].stride; goto <bb 13>; } Now, if I disable the PRE with -fno-tree-pre, I get another problem on the data dependence analysis: base_address: &d1 offset from base address: 0 constant offset from base address: 0 step: 0 aligned to: 128 base_object: d1 symbol tag: d1 FAILED as dr address is invariant /home/seb/ex/capacita.f90:46: note: not vectorized: unhandled data-ref /home/seb/ex/capacita.f90:46: note: bad data references. /home/seb/ex/capacita.f90:4: note: vectorized 0 loops in function. This fail corresponds to the following code in tree-data-ref.c /* FIXME -- data dependence analysis does not work correctly for objects with invariant addresses. Let us fail here until the problem is fixed. */ if (dr_address_invariant_p (dr)) { free_data_ref (dr); if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "\tFAILED as dr address is invariant\n"); ret = false; break; } Due to the following statement: # VUSE <d1_143> { d1 } d1.33_86 = d1; So here the data reference is for d1 that is a read with the following tree: arg 1 <var_decl 0xb7be01cc d1 type <real_type 0xb7b4eaf8 real4> addressable used public static SF file /home/seb/ex/capacita.f90 line 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4> align 32 chain <var_decl 0xb7be0170 d2 type <real_type 0xb7b4eaf8 real4> addressable used public static SF file /home/seb/ex/capacita.f90 line 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4> align 32 chain <var_decl 0xb7be0114 eps0>>> I don't really know how this could be handled as a data reference, because that statement has a VUSE but the type of d1 is scalar. A reduced testcase is like this: module solv_cap implicit none public :: init_solve integer, parameter, public :: dp = selected_real_kind(5) real(kind=dp), private :: Pi, Mu0, c0, eps0 logical, private :: UseFFT, UsePreco real(kind=dp), private :: D1, D2 integer, private, save :: Ng1=0, Ng2=0 integer, private, pointer, dimension(:,:) :: Grid real(kind=dp), private, allocatable, dimension(:,:) :: G contains subroutine init_solve(Grid_in, GrSize1, GrSize2, UseFFT_in, UsePreco_in) integer, intent(in), target, dimension(:,:) :: Grid_in real(kind=dp), intent(in) :: GrSize1, GrSize2 logical, intent(in) :: UseFFT_in, UsePreco_in integer :: i, j Pi = acos(-1.0_dp) Mu0 = 4e-7_dp * Pi c0 = 299792458 eps0 = 1 / (Mu0 * c0**2) UseFFT = UseFFT_in UsePreco = UsePreco_in if(Ng1 /= 0 .and. allocated(G) ) then deallocate( G ) end if Grid => Grid_in Ng1 = size(Grid, 1) Ng2 = size(Grid, 2) D1 = GrSize1/Ng1 D2 = GrSize2/Ng2 allocate( G(0:Ng1,0:Ng2) ) write(unit=*, fmt=*) "Calculating G" do i=0,Ng1 do j=0,Ng2 G(i,j) = Ginteg( -D1/2,-D2/2, D1/2,D2/2, i*D1,j*D2 ) end do end do if(UseFFT) then write(unit=*, fmt=*) "Transforming G" call FourirG(G,1) end if return end subroutine init_solve function Ginteg(xq1,yq1, xq2,yq2, xp,yp) result(G) real(kind=dp), intent(in) :: xq1,yq1, xq2,yq2, xp,yp real(kind=dp) :: G real(kind=dp) :: x1,x2,y1,y2,t x1 = xq1-xp x2 = xq2-xp y1 = yq1-yp y2 = yq2-yp if (x1+x2 < 0) then t = -x1 x1 = -x2 x2 = t end if if (y1+y2 < 0) then t = -y1 y1 = -y2 y2 = t end if G = Vprim(x2,y2)-Vprim(x1,y2)-Vprim(x2,y1)+Vprim(x1,y1) return end function Ginteg function Vprim(x,y) result(VP) real(kind=dp), intent(in) :: x,y real(kind=dp) :: VP real(kind=dp) :: r r = sqrt(x**2+y**2) VP = y*log(x+r) + x*log(y+r) return end function Vprim end module solv_cap
Subject: Re: New: Missed opportunities for vectorization due to PRE On 30 Aug 2007 02:55:17 -0000, spop at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > The following loop showing up in the top time users in capacita.f90 is > not vectorized because the loop latch block is non empty: > > ./capacita.f90:51: note: ===== analyze_loop_nest ===== > ./capacita.f90:51: note: === vect_analyze_loop_form === > ./capacita.f90:51: note: not vectorized: unexpected loop form. > ./capacita.f90:51: note: bad loop form. > ./capacita.f90:9: note: vectorized 0 loops in function. > > This block contains the following code that comes from the > partial redundancy elimination pass: > > bb_14 (preds = {bb_13 }, succs = {bb_13 }) > { > <bb 14>: > # VUSE <SFT.109_593> { SFT.109 } > pretmp.166_821 = g.dim[1].stride; > goto <bb 13>; > > } > PRE is just invariant hoisting. If we didn't, something else would (LIM). > Now, if I disable the PRE with -fno-tree-pre, I get another problem on > the data dependence analysis: > > base_address: &d1 > offset from base address: 0 > constant offset from base address: 0 > step: 0 > aligned to: 128 > base_object: d1 > symbol tag: d1 > FAILED as dr address is invariant > > /home/seb/ex/capacita.f90:46: note: not vectorized: unhandled data-ref > /home/seb/ex/capacita.f90:46: note: bad data references. > /home/seb/ex/capacita.f90:4: note: vectorized 0 loops in function. > > This fail corresponds to the following code in tree-data-ref.c > > /* FIXME -- data dependence analysis does not work correctly for objects > with > invariant addresses. Let us fail here until the problem is fixed. */ > if (dr_address_invariant_p (dr)) > { > free_data_ref (dr); > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "\tFAILED as dr address is invariant\n"); > ret = false; > break; > } > > Due to the following statement: > > # VUSE <d1_143> { d1 } > d1.33_86 = d1; > > So here the data reference is for d1 that is a read with the following tree: > > arg 1 <var_decl 0xb7be01cc d1 type <real_type 0xb7b4eaf8 real4> > addressable used public static SF file /home/seb/ex/capacita.f90 line > 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4> > align 32 > chain <var_decl 0xb7be0170 d2 type <real_type 0xb7b4eaf8 real4> > addressable used public static SF file /home/seb/ex/capacita.f90 > line 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4> > align 32 chain <var_decl 0xb7be0114 eps0>>> > > I don't really know how this could be handled as a data reference, > because that statement has a VUSE but the type of d1 is scalar. Yes, but it is a global, and should be looked at as any other load is. :)
Since the fix for PR44710 we can if-convert the conditions in the inner loop. With http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00542.html we also make sure that the latch block isn't filled, which in turn then triggers the if-conversion. This then reveals the rest of the problems, which are: * inlining needs to happen (our default parameters don't inline ginteg) The patch above ensures this by making the functions internal * a library with vectorized logf needs to be available (libacml_mv for instance) The patch above works around this by getting rid of calls to log/sqrt * loop interchange needs to happen, because in the original testcase we have: do i=0,Ng1 do j=0,Ng2 G(i,j) = ... exactly the wrong way around. Our loop-interchange code is only capable of vectorizing perfect nests, which here doesn't exist as LIM and PRE move out some loop invariant expressions from the inner to the outer loop. If we weren't doing that, that itself would already prevent vectorization. The patch above works around this by doing the interchange by hand.
Subject: Bug 33244 Author: matz Date: Wed Sep 8 12:34:52 2010 New Revision: 163998 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163998 Log: PR tree-optimization/33244 * tree-ssa-sink.c (statement_sink_location): Don't sink into empty loop latches. testsuite/ PR tree-optimization/33244 * gfortran.dg/vect/fast-math-vect-8.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-8.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-sink.c
No loop in the testcase is vectorized for the testcase with 4.9 or trunk but non-empty latches are no longer a problem. Confirmed.
The only loop I see in this file now is one which contains a call to sqrt and logf.