Bug 33244 - Missed opportunities for vectorization
Summary: Missed opportunities for vectorization
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2007-08-30 02:55 UTC by Sebastian Pop
Modified: 2021-08-11 04:13 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.9.2
Last reconfirmed: 2014-10-16 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Pop 2007-08-30 02:55:16 UTC
The following loop showing up in the top time users in capacita.f90 is
not vectorized because the loop latch block is non empty:

./capacita.f90:51: note: ===== analyze_loop_nest =====
./capacita.f90:51: note: === vect_analyze_loop_form ===
./capacita.f90:51: note: not vectorized: unexpected loop form.
./capacita.f90:51: note: bad loop form.
./capacita.f90:9: note: vectorized 0 loops in function.

This block contains the following code that comes from the
partial redundancy elimination pass:

      bb_14 (preds = {bb_13 }, succs = {bb_13 })
      {
      <bb 14>:
        # VUSE <SFT.109_593> { SFT.109 }
        pretmp.166_821 = g.dim[1].stride;
        goto <bb 13>;

      }

Now, if I disable the PRE with -fno-tree-pre, I get another problem on
the data dependence analysis:

	base_address: &d1
	offset from base address: 0
	constant offset from base address: 0
	step: 0
	aligned to: 128
	base_object: d1
	symbol tag: d1
	FAILED as dr address is invariant

/home/seb/ex/capacita.f90:46: note: not vectorized: unhandled data-ref 
/home/seb/ex/capacita.f90:46: note: bad data references.
/home/seb/ex/capacita.f90:4: note: vectorized 0 loops in function.

This fail corresponds to the following code in tree-data-ref.c

      /* FIXME -- data dependence analysis does not work correctly for objects with
	 invariant addresses.  Let us fail here until the problem is fixed.  */
      if (dr_address_invariant_p (dr))
	{
	  free_data_ref (dr);
	  if (dump_file && (dump_flags & TDF_DETAILS))
	    fprintf (dump_file, "\tFAILED as dr address is invariant\n");
	  ret = false;
	  break;
	}

Due to the following statement:

# VUSE <d1_143> { d1 }
d1.33_86 = d1;

So here the data reference is for d1 that is a read with the following tree:

    arg 1 <var_decl 0xb7be01cc d1 type <real_type 0xb7b4eaf8 real4>
        addressable used public static SF file /home/seb/ex/capacita.f90 line 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4>
        align 32
        chain <var_decl 0xb7be0170 d2 type <real_type 0xb7b4eaf8 real4>
            addressable used public static SF file /home/seb/ex/capacita.f90 line 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4>
            align 32 chain <var_decl 0xb7be0114 eps0>>>

I don't really know how this could be handled as a data reference,
because that statement has a VUSE but the type of d1 is scalar.

A reduced testcase is like this:



module solv_cap

  implicit none

  public  :: init_solve

  integer, parameter, public :: dp = selected_real_kind(5)

  real(kind=dp), private :: Pi, Mu0, c0, eps0
  logical,       private :: UseFFT, UsePreco
  real(kind=dp), private :: D1, D2
  integer,       private, save :: Ng1=0, Ng2=0
  integer,       private, pointer,     dimension(:,:)  :: Grid
  real(kind=dp), private, allocatable, dimension(:,:)  :: G

contains

  subroutine init_solve(Grid_in, GrSize1, GrSize2, UseFFT_in, UsePreco_in)
    integer, intent(in), target, dimension(:,:) :: Grid_in
    real(kind=dp), intent(in)  :: GrSize1, GrSize2
    logical,       intent(in)  :: UseFFT_in, UsePreco_in
    integer                    :: i, j

    Pi = acos(-1.0_dp)
    Mu0 = 4e-7_dp * Pi
    c0 = 299792458
    eps0 = 1 / (Mu0 * c0**2)
    
    UseFFT = UseFFT_in
    UsePreco = UsePreco_in

    if(Ng1 /= 0 .and. allocated(G) ) then
      deallocate( G )
    end if

    Grid => Grid_in
    Ng1 = size(Grid, 1)
    Ng2 = size(Grid, 2)
    D1 = GrSize1/Ng1
    D2 = GrSize2/Ng2

    allocate( G(0:Ng1,0:Ng2) )

    write(unit=*, fmt=*) "Calculating G"
    do i=0,Ng1
      do j=0,Ng2
        G(i,j) = Ginteg( -D1/2,-D2/2, D1/2,D2/2, i*D1,j*D2 )
      end do
    end do

    if(UseFFT) then
      write(unit=*, fmt=*) "Transforming G"
      call FourirG(G,1)
    end if

    return
  end subroutine init_solve


  function Ginteg(xq1,yq1, xq2,yq2, xp,yp)  result(G)
    real(kind=dp), intent(in) :: xq1,yq1, xq2,yq2, xp,yp
    real(kind=dp)             :: G
    real(kind=dp)             :: x1,x2,y1,y2,t
    x1 = xq1-xp
    x2 = xq2-xp
    y1 = yq1-yp
    y2 = yq2-yp
 
    if (x1+x2 < 0) then
      t = -x1
      x1 = -x2
      x2 = t
    end if
    if (y1+y2 < 0) then
      t = -y1
      y1 = -y2
      y2 = t
    end if

    G = Vprim(x2,y2)-Vprim(x1,y2)-Vprim(x2,y1)+Vprim(x1,y1)

    return
  end function Ginteg


  function Vprim(x,y)  result(VP)
    real(kind=dp), intent(in) :: x,y
    real(kind=dp)             :: VP
    real(kind=dp)             :: r

    r = sqrt(x**2+y**2)
    VP = y*log(x+r) + x*log(y+r)

    return
  end function Vprim


end module solv_cap
Comment 1 Daniel Berlin 2007-08-30 15:24:39 UTC
Subject: Re:  New: Missed opportunities for vectorization due to PRE

On 30 Aug 2007 02:55:17 -0000, spop at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> The following loop showing up in the top time users in capacita.f90 is
> not vectorized because the loop latch block is non empty:
>
> ./capacita.f90:51: note: ===== analyze_loop_nest =====
> ./capacita.f90:51: note: === vect_analyze_loop_form ===
> ./capacita.f90:51: note: not vectorized: unexpected loop form.
> ./capacita.f90:51: note: bad loop form.
> ./capacita.f90:9: note: vectorized 0 loops in function.
>
> This block contains the following code that comes from the
> partial redundancy elimination pass:
>
>       bb_14 (preds = {bb_13 }, succs = {bb_13 })
>       {
>       <bb 14>:
>         # VUSE <SFT.109_593> { SFT.109 }
>         pretmp.166_821 = g.dim[1].stride;
>         goto <bb 13>;
>
>       }
>

PRE is just invariant hoisting.  If we didn't, something else would (LIM).


> Now, if I disable the PRE with -fno-tree-pre, I get another problem on
> the data dependence analysis:
>
>         base_address: &d1
>         offset from base address: 0
>         constant offset from base address: 0
>         step: 0
>         aligned to: 128
>         base_object: d1
>         symbol tag: d1
>         FAILED as dr address is invariant
>
> /home/seb/ex/capacita.f90:46: note: not vectorized: unhandled data-ref
> /home/seb/ex/capacita.f90:46: note: bad data references.
> /home/seb/ex/capacita.f90:4: note: vectorized 0 loops in function.
>
> This fail corresponds to the following code in tree-data-ref.c
>
>       /* FIXME -- data dependence analysis does not work correctly for objects
> with
>          invariant addresses.  Let us fail here until the problem is fixed.  */
>       if (dr_address_invariant_p (dr))
>         {
>           free_data_ref (dr);
>           if (dump_file && (dump_flags & TDF_DETAILS))
>             fprintf (dump_file, "\tFAILED as dr address is invariant\n");
>           ret = false;
>           break;
>         }
>
> Due to the following statement:
>
> # VUSE <d1_143> { d1 }
> d1.33_86 = d1;
>
> So here the data reference is for d1 that is a read with the following tree:
>
>     arg 1 <var_decl 0xb7be01cc d1 type <real_type 0xb7b4eaf8 real4>
>         addressable used public static SF file /home/seb/ex/capacita.f90 line
> 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4>
>         align 32
>         chain <var_decl 0xb7be0170 d2 type <real_type 0xb7b4eaf8 real4>
>             addressable used public static SF file /home/seb/ex/capacita.f90
> line 11 size <integer_cst 0xb7b4163c 32> unit size <integer_cst 0xb7b41428 4>
>             align 32 chain <var_decl 0xb7be0114 eps0>>>
>
> I don't really know how this could be handled as a data reference,
> because that statement has a VUSE but the type of d1 is scalar.

Yes, but it is a global, and should be looked at as any other load is.
:)
Comment 2 Michael Matz 2010-09-07 14:41:38 UTC
Since the fix for PR44710 we can if-convert the conditions in the inner loop.
With http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00542.html we also
make sure that the latch block isn't filled, which in turn then triggers
the if-conversion.  This then reveals the rest of the problems, which are:

  * inlining needs to happen (our default parameters don't inline ginteg)
    The patch above ensures this by making the functions internal
  * a library with vectorized logf needs to be available (libacml_mv for
    instance)
    The patch above works around this by getting rid of calls to log/sqrt
  * loop interchange needs to happen, because in the original testcase
    we have:
      do i=0,Ng1
        do j=0,Ng2
          G(i,j) = ...
    exactly the wrong way around.  Our loop-interchange code is only
    capable of vectorizing perfect nests, which here doesn't exist
    as LIM and PRE move out some loop invariant expressions from the
    inner to the outer loop.  If we weren't doing that, that itself would
    already prevent vectorization.
    The patch above works around this by doing the interchange by hand.
Comment 3 Michael Matz 2010-09-08 12:35:13 UTC
Subject: Bug 33244

Author: matz
Date: Wed Sep  8 12:34:52 2010
New Revision: 163998

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163998
Log:
	PR tree-optimization/33244
	* tree-ssa-sink.c (statement_sink_location): Don't sink into
	empty loop latches.

testsuite/
	PR tree-optimization/33244
	* gfortran.dg/vect/fast-math-vect-8.f90: New test.

Added:
    trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-8.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-sink.c

Comment 4 Richard Biener 2014-10-16 08:38:47 UTC
No loop in the testcase is vectorized for the testcase with 4.9 or trunk but
non-empty latches are no longer a problem.

Confirmed.
Comment 5 Andrew Pinski 2016-08-24 07:04:03 UTC
The only loop I see in this file now is one which contains a call to sqrt and logf.