|Summary:||Fortran dot product vectorization is restricted|
|Product:||gcc||Reporter:||Janne Blomqvist <jb>|
|Component:||middle-end||Assignee:||Not yet assigned to anyone <unassigned>|
|Severity:||normal||CC:||aldot, gcc-bugs, rguenth, tkoenig, vincenzo.innocente|
|Build:||Known to work:|
|Known to fail:||Last reconfirmed:||2008-02-16 22:40:16|
|Bug Depends on:||36009|
Description Janne Blomqvist 2007-04-28 20:29:39 UTC
It seems that dot products are vectorized only for very restricted cases. For the following example subroutine testvectdp (a, b, c, n) integer, intent(in) :: n real, intent(in) :: a(n), b(n) real, intent(out) :: c c = dot_product (a, b) end subroutine testvectdp subroutine testvectdp2 (a, b, c, n) integer, intent(in) :: n real, intent(in) :: a(n), b(n) real, intent(out) :: c integer :: i c = 0.0 do i = 1, n c = c + a(i) * b(i) end do end subroutine testvectdp2 module testvec contains subroutine testvecm (a, b, c) real, intent(in) :: a(:), b(:) real, intent(out) :: c c = dot_product (a, b) end subroutine testvecm subroutine testvecm2 (a, b, c) real, intent(in) :: a(:), b(:) real, intent(out) :: c integer :: i c = 0.0 do i = 1, size (a) c = c + a(i) * b(i) end do end subroutine testvecm2 end module testvec program testvec_p use testvec implicit none real :: a(9), b(9), c external testvectdp, testvectdp2 call random_number(a) call random_number(b) call testvectdp(a,b,c,9) print *, c call testvectdp2(a,b,c,9) print *, c call testvecm(a,b,c) print *, c call testvecm2(a,b,c) print *, c end program testvec_p Only the first one, testvectdp vectorizes.
Comment 1 Richard Biener 2007-04-28 22:04:58 UTC
None is vectorized for me. I guess this is why gas_dyn regressed.
Comment 2 Dorit Naishlos 2007-05-08 21:00:14 UTC
Here is what happens in the three loops that don't get vectorized: (1) the loop in testvectdp2: This is the loop we analyze: # prephitmp.192_37 = PHI <storetmp.191_30(3), D.1443_42(5)> # i_1 = PHI <1(3), i_44(5)> <L15>:; D.1437_32 = prephitmp.192_37; D.1438_33 = (int8) i_1; D.1439_34 = D.1438_33 + -1; D.1440_36 = (*a_35(D))[D.1439_34]; D.1441_40 = (*b_39(D))[D.1439_34]; D.1442_41 = D.1441_40 * D.1440_36; D.1443_42 = prephitmp.192_37 + D.1442_41; storetmp.191_38 = D.1443_42; c__lsm.199_17 = D.1443_42; i_44 = i_1 + 1; if (i_1 == D.1429_5) goto <bb 6> (<L21>); else goto <bb 5> (<L20>); We recognize the reduction, but we think that it is used in the loop: pr31738.f90:14: note: reduction used in loop. and indeed, prephitmp.192_37 is used in: D.1443_42 = prephitmp.192_37 + D.1442_41; which is ok, because this is the reduction stmt, but also used here: D.1437_32 = prephitmp.192_37; which is indeed something that we normally don't allow. so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to be used anywhere in the function, so this stmt looks dead to me, but for some reason it is not cleaned away before the vectorizer... Still need to investigate why. (2) the loop in testvecm: This looks like the problem reported in PR31756: failed to compute offset or step for (*a.0_11)[D.1559_52] create_data_ref: failed to create a dr for (*a.0_11)[D.1559_52] pr31738.f90:24: note: not vectorized: unhandled data-ref (3) the loop in testvecm2 Same story (the PR31756 problem): failed to compute offset or step for (*a.0_10)[D.1509_52] create_data_ref: failed to create a dr for (*a.0_10)[D.1509_52] pr31738.f90:32: note: not vectorized: unhandled data-ref
Comment 3 Dorit Naishlos 2007-05-16 20:45:32 UTC
(In reply to comment #2) > Here is what happens in the three loops that don't get vectorized: > (1) the loop in testvectdp2: ... > so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to > be used anywhere in the function, so this stmt looks dead to me, but for > some reason it is not cleaned away before the vectorizer... Still need to > investigate why. So looks like the stmt D.1437_32 = prephitmp.192_37 became dead by pass pr31738a.f90.089t.copyprop3. So the question is what's the most appropriate fix: (1) fix copyprop3 to also clean away any dead code it creates? (2) add a dce pass after copyprop3? (3) work around it in the vectorizer. I think it should be easy - just move the check of the uses of the reduction in the loop until after the vectorizer analysis pass that marks relevant stmts. If (3) sounds like the way to go - I can prepare a patch for that.
Comment 4 Andrew Pinski 2007-05-29 04:43:54 UTC
(In reply to comment #3) > (2) add a dce pass after copyprop3? This is already done now after: 2007-05-24 Zdenek Dvorak <email@example.com> * passes.c (init_optimization_passes): Add dceloop after copy propagation in loop optimizer. Add predictive commoning to loop optimizer passes. But still none of the functions are vectorized (at least for me).
Comment 5 Janne Blomqvist 2008-02-16 22:40:16 UTC
Still occurs with 4.3.0 20080216 (experimental) [trunk revision 132367] i.e. only the first procedure (testvectdp) vectorizes, and that only with -ffast-math.
Comment 6 Richard Biener 2008-04-28 09:39:16 UTC
For testvectdp2 we now miss to apply store-motion so the reduction is no longer recognized. This is the bad interaction between PRE and lim for which we have PR36009. If you add -fno-tree-pre vectorization fails because of t.f90:7: note: reduction: not commutative/associative: D.1011_34 even with -ffast-math.