Created attachment 27087 [details] Fortran test That looks strange but the compiler behaves differently on O3 for attached test cases. Fortran can't vectorize loop which looks quite simple Is it expected behavior?
Created attachment 27088 [details] C test
DOUBLE PRECISION Dx(*) , Dy(*) and double X[1000], Y[1000] are not at all the same.
(In reply to comment #2) > DOUBLE PRECISION Dx(*) , Dy(*) > and > double X[1000], Y[1000] > are not at all the same. But one still gets the same result if one uses: void daxpy(int m, int n, double X[], double Y[], double z) which should be close to what one gets with Fortran. * * * For the Fortran loop, -ftree-vectorizer-verbose=3 shows: 14: ===== analyze_loop_nest ===== 14: === vect_analyze_loop_form === 14: not vectorized: unexpected loop form. 14: bad loop form. For the C loop: 6: Profitability threshold is 2 loop iterations. 6: created 1 versioning for alias checks. 6: vectorizing stmts using SLP. 6: LOOP VECTORIZED. For the Fortran loop, using ifort 12.1: (15): (col. 19) remark: BLOCK WAS VECTORIZED. (14): (col. 16) remark: loop was not vectorized: not inner loop. Original dump for the Fortran loop (-fdump-tree-original): D.1862 = mp1; D.1863 = *n; i = D.1862; if (D.1863 < D.1862) goto L.2; countm1.0 = (unsigned int) (NON_LVALUE_EXPR <D.1863> - NON_LVALUE_EXPR <D.1862>) / 4; while (1) { (*dy)[(integer(kind=8)) i + -1] = (*dy)[(integer(kind=8)) i + -1] + *da * (*dx)[(integer(kind=8)) i + -1]; (*dy)[(integer(kind=8)) (i + 1) + -1] = (*dy)[(integer(kind=8)) (i + 1) + -1] + *da * (*dx)[(integer(kind=8)) (i + 1) + -1]; (*dy)[(integer(kind=8)) (i + 2) + -1] = (*dy)[(integer(kind=8)) (i + 2) + -1] + *da * (*dx)[(integer(kind=8)) (i + 2) + -1]; (*dy)[(integer(kind=8)) (i + 3) + -1] = (*dy)[(integer(kind=8)) (i + 3) + -1] + *da * (*dx)[(integer(kind=8)) (i + 3) + -1]; L.1:; i = i + 4; if (countm1.0 == 0) goto L.2; countm1.0 = countm1.0 + 4294967295; } L.2:;
Reopen for reconsideration by the GCC's vectorization experts.
Seems it doesn't like non-empty latch block in Fortran case
Any ideas what exactly does prevent the vectorization in the case of Fortran?
Why for Fortran case loop is transformed in such form? It doesn't happen for C so probably it's Fortran issue
In another bug I stated that while (1) { ... if (countm1.0 == 0) goto L.2; countm1.0 = countm1.0 + 4294967295; } L.2:; is bad for the vectorizer (the non-empty latch block). You instead want GFortran to emit while (1) { ... tem = countm1.0 countm1.0 = countm1.0 + 4294967295; if (tem == 0) goto L.2; } L.2:; where hopefully the addition does not overflow ... That said, somewhat lessening the restriction on empty latch blocks is certainly possible (IV increments should be fine), but it might be not as trivial as it looks.
countm1.0 type is unsigned, thus + 0xffffffff is effectively - 1.
BTW, does Fortran have well defined number of iterations if say a do loop goes from (unknown to compiler): integer :: i, m, n m = huge(0) - 7 n = huge(0) - 2 do i = m, n, 4 ... end do ? If it must iterate exactly twice (for i = huge(0) - 7 and i = huge(0) - 3), then it can't be expressed as a corresponding C loop (which would end up with undefined behavior). But using a temporary, increment and then test of the temporary should be doable in the FE, the question is if it does cure this.
(In reply to comment #8) > In another bug I stated that See PR 53957
Created attachment 29178 [details] gcc48-pr52865.patch This untested patch makes the loop vectorizable. Not sure if it is better this way, or with doing assignment of the condition result into a bool and using it later (as done in the patch for the other PR).
Author: jakub Date: Wed Jan 16 16:05:27 2013 New Revision: 195241 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195241 Log: PR fortran/52865 * trans-stmt.c (gfc_trans_do): Put countm1-- before conditional and use value of countm1 before the decrement in the condition. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/trans-stmt.c
(In reply to comment #13) > PR fortran/52865 > * trans-stmt.c (gfc_trans_do): Put countm1-- before conditional > and use value of countm1 before the decrement in the condition. Cool, this should help a few Polyhedron benchmarks! :-)
Vectorized now.