52865 – GCC can't vectorize fortran loop but able to vectorize similar c-loop

Bug 52865 - GCC can't vectorize fortran loop but able to vectorize similar c-loop

Summary: GCC can't vectorize fortran loop but able to vectorize similar c-loop

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	fortran (show other bugs)
Version:	4.8.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2012-04-04 13:26 UTC by Igor Zamyatin
Modified:	2013-03-28 13:24 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:	x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed:	2013-01-16 00:00:00

Attachments
Fortran test (344 bytes, text/plain) 2012-04-04 13:26 UTC, Igor Zamyatin	Details
C test (142 bytes, text/plain) 2012-04-04 13:27 UTC, Igor Zamyatin	Details
gcc48-pr52865.patch (646 bytes, patch) 2013-01-16 11:59 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Igor Zamyatin 2012-04-04 13:26:24 UTC

Created attachment 27087 [details]
Fortran test

That looks strange but the compiler behaves differently on O3 for attached test cases. 
Fortran can't vectorize loop which looks quite simple

Is it expected behavior?

Comment 1 Igor Zamyatin 2012-04-04 13:27:11 UTC

Created attachment 27088 [details]
C test

Comment 2 Richard Biener 2012-04-04 13:49:38 UTC

DOUBLE PRECISION Dx(*) , Dy(*)

and

double X[1000], Y[1000]

are not at all the same.

Comment 3 Tobias Burnus 2012-04-04 14:00:30 UTC

(In reply to comment #2)
>   DOUBLE PRECISION Dx(*) , Dy(*)
> and
>   double X[1000], Y[1000]
> are not at all the same.

But one still gets the same result if one uses:

  void daxpy(int m, int n, double X[], double Y[], double z)

which should be close to what one gets with Fortran.


 * * *

For the Fortran loop, -ftree-vectorizer-verbose=3 shows:

14: ===== analyze_loop_nest =====
14: === vect_analyze_loop_form ===
14: not vectorized: unexpected loop form.
14: bad loop form.


For the C loop:
6: Profitability threshold is 2 loop iterations.
6: created 1 versioning for alias checks.
6: vectorizing stmts using SLP.
6: LOOP VECTORIZED.


For the Fortran loop, using ifort 12.1:
(15): (col. 19) remark: BLOCK WAS VECTORIZED.
(14): (col. 16) remark: loop was not vectorized: not inner loop.



Original dump for the Fortran loop (-fdump-tree-original):

    D.1862 = mp1;
    D.1863 = *n;
    i = D.1862;
    if (D.1863 < D.1862) goto L.2;
    countm1.0 = (unsigned int) (NON_LVALUE_EXPR <D.1863>
                                - NON_LVALUE_EXPR <D.1862>) / 4;
    while (1)
      {
        (*dy)[(integer(kind=8)) i + -1] = (*dy)[(integer(kind=8)) i + -1]
                                  + *da * (*dx)[(integer(kind=8)) i + -1];
        (*dy)[(integer(kind=8)) (i + 1) + -1]
                   = (*dy)[(integer(kind=8)) (i + 1) + -1]
             + *da * (*dx)[(integer(kind=8)) (i + 1) + -1];
        (*dy)[(integer(kind=8)) (i + 2) + -1]
                   = (*dy)[(integer(kind=8)) (i + 2) + -1]
             + *da * (*dx)[(integer(kind=8)) (i + 2) + -1];
        (*dy)[(integer(kind=8)) (i + 3) + -1]
                   = (*dy)[(integer(kind=8)) (i + 3) + -1]
             + *da * (*dx)[(integer(kind=8)) (i + 3) + -1];
        L.1:;
        i = i + 4;
        if (countm1.0 == 0) goto L.2;
        countm1.0 = countm1.0 + 4294967295;
      }
    L.2:;

Comment 4 Tobias Burnus 2012-04-04 14:26:18 UTC

Reopen for reconsideration by the GCC's vectorization experts.

Comment 5 Igor Zamyatin 2012-04-04 15:20:41 UTC

Seems it doesn't like non-empty latch block in Fortran case

Comment 6 Igor Zamyatin 2012-04-16 07:16:56 UTC

Any ideas what exactly does prevent the vectorization in the case of Fortran?

Comment 7 Igor Zamyatin 2013-01-16 07:02:03 UTC

Why for Fortran case loop is transformed in such form? It doesn't happen for C so probably it's Fortran issue

Comment 8 Richard Biener 2013-01-16 10:04:27 UTC

In another bug I stated that

    while (1)
      {
...
        if (countm1.0 == 0) goto L.2;
        countm1.0 = countm1.0 + 4294967295;
      }
    L.2:;

is bad for the vectorizer (the non-empty latch block).  You instead
want GFortran to emit

    while (1)
      {
...
        tem = countm1.0
        countm1.0 = countm1.0 + 4294967295;
        if (tem == 0) goto L.2;
      }
    L.2:;

where hopefully the addition does not overflow ...

That said, somewhat lessening the restriction on empty latch blocks is
certainly possible (IV increments should be fine), but it might be not
as trivial as it looks.

Comment 9 Jakub Jelinek 2013-01-16 10:25:48 UTC

countm1.0 type is unsigned, thus + 0xffffffff is effectively - 1.

Comment 10 Jakub Jelinek 2013-01-16 10:47:18 UTC

BTW, does Fortran have well defined number of iterations if say a do loop goes from (unknown to compiler):
  integer :: i, m, n
  m = huge(0) - 7
  n = huge(0) - 2
  do i = m, n, 4
   ...
  end do
?  If it must iterate exactly twice (for i = huge(0) - 7 and i = huge(0) - 3), then it can't be expressed as a corresponding C loop (which would end up with undefined behavior).  But using a temporary, increment and then test of the temporary should be doable in the FE, the question is if it does cure this.

Comment 11 Tobias Burnus 2013-01-16 11:32:15 UTC

(In reply to comment #8)
> In another bug I stated that

See PR 53957

Comment 12 Jakub Jelinek 2013-01-16 11:59:29 UTC

Created attachment 29178 [details]
gcc48-pr52865.patch

This untested patch makes the loop vectorizable.
Not sure if it is better this way, or with doing assignment of the condition result into a bool and using it later (as done in the patch for the other PR).

Comment 13 Jakub Jelinek 2013-01-16 16:05:42 UTC

Author: jakub
Date: Wed Jan 16 16:05:27 2013
New Revision: 195241

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195241
Log:
	PR fortran/52865
	* trans-stmt.c (gfc_trans_do): Put countm1-- before conditional
	and use value of countm1 before the decrement in the condition.

Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/trans-stmt.c

Comment 14 Steven Bosscher 2013-01-16 22:27:31 UTC

(In reply to comment #13)
>     PR fortran/52865
>     * trans-stmt.c (gfc_trans_do): Put countm1-- before conditional
>     and use value of countm1 before the decrement in the condition.

Cool, this should help a few Polyhedron benchmarks! :-)

Comment 15 Richard Biener 2013-03-28 13:24:16 UTC

Vectorized now.