Bug 52865 - GCC can't vectorize fortran loop but able to vectorize similar c-loop
Summary: GCC can't vectorize fortran loop but able to vectorize similar c-loop
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2012-04-04 13:26 UTC by Igor Zamyatin
Modified: 2013-03-28 13:24 UTC (History)
4 users (show)

See Also:
Host:
Target: x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-01-16 00:00:00


Attachments
Fortran test (344 bytes, text/plain)
2012-04-04 13:26 UTC, Igor Zamyatin
Details
C test (142 bytes, text/plain)
2012-04-04 13:27 UTC, Igor Zamyatin
Details
gcc48-pr52865.patch (646 bytes, patch)
2013-01-16 11:59 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Igor Zamyatin 2012-04-04 13:26:24 UTC
Created attachment 27087 [details]
Fortran test

That looks strange but the compiler behaves differently on O3 for attached test cases. 
Fortran can't vectorize loop which looks quite simple

Is it expected behavior?
Comment 1 Igor Zamyatin 2012-04-04 13:27:11 UTC
Created attachment 27088 [details]
C test
Comment 2 Richard Biener 2012-04-04 13:49:38 UTC
DOUBLE PRECISION Dx(*) , Dy(*)

and

double X[1000], Y[1000]

are not at all the same.
Comment 3 Tobias Burnus 2012-04-04 14:00:30 UTC
(In reply to comment #2)
>   DOUBLE PRECISION Dx(*) , Dy(*)
> and
>   double X[1000], Y[1000]
> are not at all the same.

But one still gets the same result if one uses:

  void daxpy(int m, int n, double X[], double Y[], double z)

which should be close to what one gets with Fortran.


 * * *

For the Fortran loop, -ftree-vectorizer-verbose=3 shows:

14: ===== analyze_loop_nest =====
14: === vect_analyze_loop_form ===
14: not vectorized: unexpected loop form.
14: bad loop form.


For the C loop:
6: Profitability threshold is 2 loop iterations.
6: created 1 versioning for alias checks.
6: vectorizing stmts using SLP.
6: LOOP VECTORIZED.


For the Fortran loop, using ifort 12.1:
(15): (col. 19) remark: BLOCK WAS VECTORIZED.
(14): (col. 16) remark: loop was not vectorized: not inner loop.



Original dump for the Fortran loop (-fdump-tree-original):

    D.1862 = mp1;
    D.1863 = *n;
    i = D.1862;
    if (D.1863 < D.1862) goto L.2;
    countm1.0 = (unsigned int) (NON_LVALUE_EXPR <D.1863>
                                - NON_LVALUE_EXPR <D.1862>) / 4;
    while (1)
      {
        (*dy)[(integer(kind=8)) i + -1] = (*dy)[(integer(kind=8)) i + -1]
                                  + *da * (*dx)[(integer(kind=8)) i + -1];
        (*dy)[(integer(kind=8)) (i + 1) + -1]
                   = (*dy)[(integer(kind=8)) (i + 1) + -1]
             + *da * (*dx)[(integer(kind=8)) (i + 1) + -1];
        (*dy)[(integer(kind=8)) (i + 2) + -1]
                   = (*dy)[(integer(kind=8)) (i + 2) + -1]
             + *da * (*dx)[(integer(kind=8)) (i + 2) + -1];
        (*dy)[(integer(kind=8)) (i + 3) + -1]
                   = (*dy)[(integer(kind=8)) (i + 3) + -1]
             + *da * (*dx)[(integer(kind=8)) (i + 3) + -1];
        L.1:;
        i = i + 4;
        if (countm1.0 == 0) goto L.2;
        countm1.0 = countm1.0 + 4294967295;
      }
    L.2:;
Comment 4 Tobias Burnus 2012-04-04 14:26:18 UTC
Reopen for reconsideration by the GCC's vectorization experts.
Comment 5 Igor Zamyatin 2012-04-04 15:20:41 UTC
Seems it doesn't like non-empty latch block in Fortran case
Comment 6 Igor Zamyatin 2012-04-16 07:16:56 UTC
Any ideas what exactly does prevent the vectorization in the case of Fortran?
Comment 7 Igor Zamyatin 2013-01-16 07:02:03 UTC
Why for Fortran case loop is transformed in such form? It doesn't happen for C so probably it's Fortran issue
Comment 8 Richard Biener 2013-01-16 10:04:27 UTC
In another bug I stated that

    while (1)
      {
...
        if (countm1.0 == 0) goto L.2;
        countm1.0 = countm1.0 + 4294967295;
      }
    L.2:;

is bad for the vectorizer (the non-empty latch block).  You instead
want GFortran to emit

    while (1)
      {
...
        tem = countm1.0
        countm1.0 = countm1.0 + 4294967295;
        if (tem == 0) goto L.2;
      }
    L.2:;

where hopefully the addition does not overflow ...

That said, somewhat lessening the restriction on empty latch blocks is
certainly possible (IV increments should be fine), but it might be not
as trivial as it looks.
Comment 9 Jakub Jelinek 2013-01-16 10:25:48 UTC
countm1.0 type is unsigned, thus + 0xffffffff is effectively - 1.
Comment 10 Jakub Jelinek 2013-01-16 10:47:18 UTC
BTW, does Fortran have well defined number of iterations if say a do loop goes from (unknown to compiler):
  integer :: i, m, n
  m = huge(0) - 7
  n = huge(0) - 2
  do i = m, n, 4
   ...
  end do
?  If it must iterate exactly twice (for i = huge(0) - 7 and i = huge(0) - 3), then it can't be expressed as a corresponding C loop (which would end up with undefined behavior).  But using a temporary, increment and then test of the temporary should be doable in the FE, the question is if it does cure this.
Comment 11 Tobias Burnus 2013-01-16 11:32:15 UTC
(In reply to comment #8)
> In another bug I stated that

See PR 53957
Comment 12 Jakub Jelinek 2013-01-16 11:59:29 UTC
Created attachment 29178 [details]
gcc48-pr52865.patch

This untested patch makes the loop vectorizable.
Not sure if it is better this way, or with doing assignment of the condition result into a bool and using it later (as done in the patch for the other PR).
Comment 13 Jakub Jelinek 2013-01-16 16:05:42 UTC
Author: jakub
Date: Wed Jan 16 16:05:27 2013
New Revision: 195241

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195241
Log:
	PR fortran/52865
	* trans-stmt.c (gfc_trans_do): Put countm1-- before conditional
	and use value of countm1 before the decrement in the condition.

Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/trans-stmt.c
Comment 14 Steven Bosscher 2013-01-16 22:27:31 UTC
(In reply to comment #13)
>     PR fortran/52865
>     * trans-stmt.c (gfc_trans_do): Put countm1-- before conditional
>     and use value of countm1 before the decrement in the condition.

Cool, this should help a few Polyhedron benchmarks! :-)
Comment 15 Richard Biener 2013-03-28 13:24:16 UTC
Vectorized now.