Bug 31738

Summary: Fortran dot product vectorization is restricted
Product: gcc Reporter: Janne Blomqvist <jb>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: aldot, gcc-bugs, rguenth, tkoenig, vincenzo.innocente
Priority: P3 Keywords: missed-optimization
Version: 4.3.0   
Target Milestone: ---   
Host: Target: i686-pc-linux-gnu
Build: Known to work:
Known to fail: Last reconfirmed: 2008-02-16 22:40:16
Bug Depends on: 36009    
Bug Blocks: 34265    

Description Janne Blomqvist 2007-04-28 20:29:39 UTC
It seems that dot products are vectorized only for very restricted cases. For the following example

subroutine testvectdp (a, b, c, n)
  integer, intent(in) :: n
  real, intent(in) :: a(n), b(n)
  real, intent(out) :: c
  c = dot_product (a, b)
end subroutine testvectdp

subroutine testvectdp2 (a, b, c, n)
  integer, intent(in) :: n
  real, intent(in) :: a(n), b(n)
  real, intent(out) :: c
  integer :: i
  c = 0.0
  do i = 1, n
     c = c + a(i) * b(i)
  end do
end subroutine testvectdp2

module testvec
contains
  subroutine testvecm (a, b, c)
    real, intent(in) :: a(:), b(:)
    real, intent(out) :: c
    c = dot_product (a, b)
  end subroutine testvecm
  
  subroutine testvecm2 (a, b, c)
    real, intent(in) :: a(:), b(:)
    real, intent(out) :: c
    integer :: i
    c = 0.0
    do i = 1, size (a)
       c = c + a(i) * b(i)
    end do
  end subroutine testvecm2
end module testvec

program testvec_p
  use testvec
  implicit none
  real :: a(9), b(9), c
  external testvectdp, testvectdp2

  call random_number(a)
  call random_number(b)

  call testvectdp(a,b,c,9)
  print *, c
  call testvectdp2(a,b,c,9)
  print *, c
  call testvecm(a,b,c)
  print *, c
  call testvecm2(a,b,c)
  print *, c
end program testvec_p
  
Only the first one, testvectdp vectorizes.
Comment 1 Richard Biener 2007-04-28 22:04:58 UTC
None is vectorized for me.  I guess this is why gas_dyn regressed.
Comment 2 Dorit Naishlos 2007-05-08 21:00:14 UTC
Here is what happens in the three loops that don't get vectorized:

(1) the loop in testvectdp2: 
This is the loop we analyze:

  # prephitmp.192_37 = PHI <storetmp.191_30(3), D.1443_42(5)>
  # i_1 = PHI <1(3), i_44(5)>
<L15>:;
  D.1437_32 = prephitmp.192_37;
  D.1438_33 = (int8) i_1;
  D.1439_34 = D.1438_33 + -1;
  D.1440_36 = (*a_35(D))[D.1439_34];
  D.1441_40 = (*b_39(D))[D.1439_34];
  D.1442_41 = D.1441_40 * D.1440_36;
  D.1443_42 = prephitmp.192_37 + D.1442_41;
  storetmp.191_38 = D.1443_42;
  c__lsm.199_17 = D.1443_42;
  i_44 = i_1 + 1;
  if (i_1 == D.1429_5)
    goto <bb 6> (<L21>);
  else
    goto <bb 5> (<L20>);

We recognize the reduction, but we think that it is used in the loop:
  pr31738.f90:14: note: reduction used in loop.

and indeed, prephitmp.192_37 is used in:
  D.1443_42 = prephitmp.192_37 + D.1442_41;
which is ok, because this is the reduction stmt,
but also used here:
  D.1437_32 = prephitmp.192_37;
which is indeed something that we normally don't allow.
so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to be used anywhere in the function, so this stmt looks dead to me, but for some reason it is not cleaned away before the vectorizer...  Still need to investigate why. 


(2) the loop in testvecm:
This looks like the problem reported in PR31756:

failed to compute offset or step for (*a.0_11)[D.1559_52]
create_data_ref: failed to create a dr for (*a.0_11)[D.1559_52]
pr31738.f90:24: note: not vectorized: unhandled data-ref

(3) the loop in testvecm2
Same story (the PR31756 problem):

failed to compute offset or step for (*a.0_10)[D.1509_52]
create_data_ref: failed to create a dr for (*a.0_10)[D.1509_52]
pr31738.f90:32: note: not vectorized: unhandled data-ref
Comment 3 Dorit Naishlos 2007-05-16 20:45:32 UTC
(In reply to comment #2)
> Here is what happens in the three loops that don't get vectorized:
> (1) the loop in testvectdp2: 
...
> so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to > be used anywhere in the function, so this stmt looks dead to me, but for 
> some reason it is not cleaned away before the vectorizer...  Still need to
> investigate why. 

So looks like the stmt 
   D.1437_32 = prephitmp.192_37
became dead by pass pr31738a.f90.089t.copyprop3.

So the question is what's the most appropriate fix:
(1) fix copyprop3 to also clean away any dead code it creates?
(2) add a dce pass after copyprop3?
(3) work around it in the vectorizer. I think it should be easy - just move the check of the uses of the reduction in the loop until after the vectorizer analysis pass that marks relevant stmts.

If (3) sounds like the way to go - I can prepare a patch for that.
Comment 4 Andrew Pinski 2007-05-29 04:43:54 UTC
(In reply to comment #3)
> (2) add a dce pass after copyprop3?

This is already done now after:
2007-05-24  Zdenek Dvorak  <dvorakz@suse.cz>

        * passes.c (init_optimization_passes):  Add dceloop after
        copy propagation in loop optimizer.  Add predictive commoning
        to loop optimizer passes.
But still none of the functions are vectorized (at least for me).
Comment 5 Janne Blomqvist 2008-02-16 22:40:16 UTC
Still occurs with 

 4.3.0 20080216 (experimental) [trunk revision 132367]

i.e. only the first procedure (testvectdp) vectorizes, and that only with -ffast-math. 
Comment 6 Richard Biener 2008-04-28 09:39:16 UTC
For testvectdp2 we now miss to apply store-motion so the reduction is no longer
recognized.  This is the bad interaction between PRE and lim for which we have
PR36009.  If you add -fno-tree-pre vectorization fails because of

t.f90:7: note: reduction: not commutative/associative: D.1011_34

even with -ffast-math.