Bug 42131 - Weird translation of DO loops
Summary: Weird translation of DO loops
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Thomas Koenig
URL:
Keywords: missed-optimization
Depends on:
Blocks: 42108
  Show dependency treegraph
 
Reported: 2009-11-21 13:58 UTC by Richard Biener
Modified: 2009-12-04 14:24 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
proposed patch (591 bytes, patch)
2009-11-21 23:07 UTC, Thomas Koenig
Details | Diff
another proposed patch (622 bytes, patch)
2009-11-23 21:48 UTC, Thomas Koenig
Details | Diff
patch that implements the multiplication idea (1.02 KB, patch)
2009-11-26 21:56 UTC, Thomas Koenig
Details | Diff
Patch that works for unrolling (898 bytes, patch)
2009-11-30 07:31 UTC, Thomas Koenig
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2009-11-21 13:58:03 UTC
Split out from PR42108.

The loop is not unrolled because the frontend presents us with very funny
obfuscated code:

      do  k=i,nnd,n
        temp=temp+(x(k)-x(k+jmini))**2
      end do

gets translated to

{
  character(kind=4) countm1.6;
  integer(kind=4) D.1551;
  integer(kind=4) D.1550;
  integer(kind=4) D.1549;

  D.1549 = i;
  D.1550 = *nnd;
  D.1551 = *n;
  k = D.1549;
  if (D.1551 > 0)
    {
      if (D.1550 < D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1550 -
D.1549) / (character(kind=4)) D.1551;;
    }
  else
    {
      if (D.1550 > D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1549 -
D.1550) / (character(kind=4)) -D.1551;;
    }
  while (1)
    {
        {
          real(kind=8) D.1556;
          real(kind=8) D.1555;

          D.1555 = (((*x)[(integer(kind=8)) k + -1] - (*x)[(integer(kind=8)) (k
+ jmini) + -1]));
          D.1556 = D.1555 * D.1555;
          temp = temp + D.1556;
        }
      L.5:;
      k = k + D.1551;
      if (countm1.6 == 0) goto L.6;
      countm1.6 = countm1.6 + 4294967295;
    }
  L.6:;
}


The funny conditional initialization of countm1.6 makes the analysis of
the number of iterations of this loop impossible (not to mention the
conversions to character(kind=4)).

Toon suggests:

The Standard doesn't prescribe the code the Frontend generates - however, to be
sure one follows the Standard, it's most easy to simply implement the steps
given.

To illustrate this with a simple example:

DO I = M1, M2, M3
   B(I) = A(I)
ENDDO

would be most easily, and atraightforwardly, implemented as follows:

     IF (M3 > 0 .AND. M1 < M2) GOTO 200  ! Loop executed zero times
     IF (M3 < 0 .AND. M1 > M2) GOTO 200  ! Ditto
     ITEMP = (M2 - M1 + M3) / M3         ! Temporary loop count
     I     = M1
 100 CONTINUE
     B(I)  = A(I)
     ITEMP = ITEMP - 1                   ! Adjust internal loop counter
     I     = I + M3                      ! Adjust DO loop variable
     IF (ITEMP > 0) GOTO 100
 200 CONTINUE

That there are two induction variables in this loop is inconsequential - one of
them should be eliminated by induction variable elimination (at least, that was
the case with g77 and the RTL loop optimization pass).


which I would agree with.  I btw cannot see the difference between

  if (D.1551 > 0)
    {
      if (D.1550 < D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1550 -
D.1549) / (character(kind=4)) D.1551;;
    }
  else
    {
      if (D.1550 > D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1549 -
D.1550) / (character(kind=4)) -D.1551;;
    }

and

if ((D.1551 > 0 && D.1550 < D.1549) || (D.1551 < 0 && D.1550 > D.1549))
  goto L.6;

countm1.6 = (character(kind=4)) (D.1550 - D.1549) / (character(kind=4)) D.1551;

where the unconditional initialization of countm1.6 is the important difference
(I'm sure the zero-trip-count check can be done more efficiently).
Comment 1 kargls 2009-11-21 16:26:30 UTC
(In reply to comment #0)

> To illustrate this with a simple example:
> 
> DO I = M1, M2, M3
>    B(I) = A(I)
> ENDDO
> 
> would be most easily, and atraightforwardly, implemented as follows:
> 
>      IF (M3 > 0 .AND. M1 < M2) GOTO 200  ! Loop executed zero times
>      IF (M3 < 0 .AND. M1 > M2) GOTO 200  ! Ditto

First, I just woke up 10 minutes ago and still have 1/2 of cup of coffee,
but the logic above looks wrong.

do i = 1, 2, 1
   print *, i
end do

should produce
1
2

Here, we have m1 = 1, m2 = 2, m3 = 1

>      IF (M3 > 0 .AND. M1 < M2) GOTO 200  ! Loop executed zero times

Comment 2 Toon Moene 2009-11-21 17:33:37 UTC
Sorry, Steve - my mistake.

The original message should have been:

To illustrate this with a simple example:

DO I = M1, M2, M3
   B(I) = A(I)
ENDDO

would be most easily, and straightforwardly, implemented as follows:

     IF (M3 > 0 .AND. M1 > M2) GOTO 200  ! Loop executed zero times
     IF (M3 < 0 .AND. M1 < M2) GOTO 200  ! Ditto
     ITEMP = (M2 - M1 + M3) / M3         ! Temporary loop counter
     I     = M1
 100 CONTINUE
     B(I)  = A(I)
     ITEMP = ITEMP - 1                   ! Adjust internal loop counter
     I     = I + M3                      ! Adjust DO loop variable
     IF (ITEMP >= 0) GOTO 100
 200 CONTINUE

I hope this makes clear what's weird about the way the Fortran Frontend does it now.  The example code follows the Standard as close as possible (it only doesn't check that m3 isn't zero, which isn't allowed), except that I follow Note 8.7 instead of the reasoning in the "sequence of steps".
Comment 3 Thomas Koenig 2009-11-21 18:31:24 UTC
(In reply to comment #2)
> Sorry, Steve - my mistake.
> 
> The original message should have been:
> 
> To illustrate this with a simple example:
> 
> DO I = M1, M2, M3
>    B(I) = A(I)
> ENDDO
> 
> would be most easily, and straightforwardly, implemented as follows:
> 
>      IF (M3 > 0 .AND. M1 > M2) GOTO 200  ! Loop executed zero times
>      IF (M3 < 0 .AND. M1 < M2) GOTO 200  ! Ditto
>      ITEMP = (M2 - M1 + M3) / M3         ! Temporary loop counter
>      I     = M1
>  100 CONTINUE
>      B(I)  = A(I)
>      ITEMP = ITEMP - 1                   ! Adjust internal loop counter
>      I     = I + M3                      ! Adjust DO loop variable
>      IF (ITEMP >= 0) GOTO 100
>  200 CONTINUE

As written, this executes the assignment one time too many.
The last but one line should read:

      IF (ITEMP > 0) GOTO 100

What we generate has the test at the bottom of the loop,
whereas 8.1.6.4.2 puts the test of the iteration count first.

What we could generate, IMHO, is

     read (*,*) m1, m2, m3
      ITEMP = (M2 - M1 + M3) / M3         ! Temporary loop counter
      print *,'itemp=',itemp
      I = M1
 100  CONTINUE
      IF (ITEMP <= 0) GOTO 200
      print *,i
      I  = I + M3                         ! Adjust DO loop variable
      ITEMP = ITEMP - 1                   ! Adjust internal loop counter
      GOTO 100
 200  CONTINUE
      print *,"final : i = ", i
      end

Does this have any drawbacks?  Would the middle end recognize this for
vectorization and loop unrolling?
Comment 4 Richard Biener 2009-11-21 19:24:06 UTC
The middle-end prefers do { } while () loop style so it knows the loop is
always executed.  It even tries to transform other loop forms into this by
copying the loop header.  So if the FE already can cheaply produce this
it would be prefered.
Comment 5 Toon Moene 2009-11-21 21:40:53 UTC
> The middle-end prefers do { } while () loop style so it knows the loop is
> always executed. 

And the Fortran Standard describes the loops being built (by compilers) just so:

1. First you determine what is m1, m2, m3
2. Then you initialize the do loop counter with m1
3. Then you determine the loop count (m2 - m1 + m3) / m3
4. If the loop count is zero or negative, you skip the loop
5. Otherwise, you traverse the loop body at least once.

Or, to return to our example (and fix the mistake Thomas Koenig noted):

DO I = M1, M2, M3
   B(I) = A(I)
ENDDO

turns into

      ITEMP = (M2 - M1 + M3) / M3  ! The iteration count
      IF (ITEMP <= 0) GOTO 200     ! Past this point, the loop is executed
      I = M1                       ! at least once.
  100 CONTINUE
      B(I) = A(I)
      ITEMP = ITEMP - 1
      I = I + M3
      IF (ITEMP > 0) GOTO 100
  200 CONTINUE

therewith following the (normative) text of the Standard.
Comment 6 Thomas Koenig 2009-11-21 23:07:03 UTC
Created attachment 19076 [details]
proposed patch

This patch generates

    D.1336 = m1;
    D.1337 = m2;
    D.1338 = m3;
    i = D.1336;
    if (D.1338 > 0)
      {
        if (D.1337 < D.1336) goto L.2;
      }
    else
      {
        if (D.1337 > D.1336) goto L.2;
      }
    countm1.1 = (character(kind=4)) (D.1337 - D.1336) / (character(kind=4)) D.1338;
    while (1)
      {
        {

Is this better, or did I overlook anything?
Comment 7 rguenther@suse.de 2009-11-21 23:23:55 UTC
Subject: Re:  Weird translation of DO loops

On Sat, 21 Nov 2009, tkoenig at gcc dot gnu dot org wrote:

> ------- Comment #6 from tkoenig at gcc dot gnu dot org  2009-11-21 23:07 -------
> Created an attachment (id=19076)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19076&action=view)
> proposed patch
> 
> This patch generates
> 
>     D.1336 = m1;
>     D.1337 = m2;
>     D.1338 = m3;
>     i = D.1336;
>     if (D.1338 > 0)
>       {
>         if (D.1337 < D.1336) goto L.2;
>       }
>     else
>       {
>         if (D.1337 > D.1336) goto L.2;
>       }
>     countm1.1 = (character(kind=4)) (D.1337 - D.1336) / (character(kind=4))
> D.1338;
>     while (1)
>       {
>         {
> 
> Is this better, or did I overlook anything?

That's better.

Richard.
Comment 8 Thomas Koenig 2009-11-21 23:42:14 UTC
Subject: Re:  Weird translation of DO loops

On Sat, 2009-11-21 at 23:23 +0000, rguenther at suse dot de wrote:

> That's better.

Not yet correct, though, this causes regressions for

      program main
      character*9 line
      line = ' 10  1 -3'
      read (unit=line,fmt='(3I3)') m1, m2, m3

      print *,m1,m2,m3
      do i=m1, m2, m3
         print *,i
      end do
      end program main

I'll have to look some more.

Comment 9 Toon Moene 2009-11-22 10:20:45 UTC
Richard wondered about this earlier:

>    countm1.1 = (character(kind=4)) (D.1337 - D.1336) / (character(kind=4))
D.1338;

but perhaps it's better to asked explicitly:

Where does the "(character(kind=4))" comes from in this (obvious) integer computation ?
Comment 10 Tobias Burnus 2009-11-22 19:04:02 UTC
"Do loop with HUGE stepsize":
http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/63348a0461ccf3e0
> Current status is that g77 prints 0 and gfortran prints 1.

To add: NAG f95, g95, sunf95, and ifort print "1" while open64 and pathf95 print "0".
Comment 11 Thomas Koenig 2009-11-23 21:48:49 UTC
Created attachment 19104 [details]
another proposed patch

Here's another proposed patch, but there is a problem with it.

If we calculate (m2 - m1 + m3)/m3 with signed types, then this will overflow, for example, for

do i=-huge(i),huge(i),2

The current implementation gets around that by doing the addition and division unsigned, then dividing unsigned as well.  For this, there have to be two
cases, one for m3>0 and one for m3<0, which is what we generate
at the moment.

Possible solutions:

- get rid of the loop counter altogether

- perform the intermediate calculation with increased precision, if
  available

- Multiply everything by the sign of m3 before doing the unsigned
  math (which is what we do now, except that we hide this behind an
  if)

- Trust two's complement arithmetic and hope that overflows don't trap
  (not, in general, a good idea)

Any tricks I have missed?
Comment 12 Toon Moene 2009-11-24 18:03:44 UTC
> Any tricks I have missed?

Yes - we could provide for loop versioning in the front end.

I.e., generate code like:

IF (M3 > 0) THEN
   ... compute loop count ...
   ... perform loop ...
ELSE IF (M3 < 0) THEN
   ... compute loop count ...
   ... perform loop ...
ELSE
   ABORT "M3 MUST NOT BE ZERO"
ENDIF

And then hope that Value Range Propagation and InterProcedural Analysis will throw away 2 of the 3 branches of this loop because it is able to determine the sign of M3.

<evil grin>
Comment 13 Thomas Koenig 2009-11-26 21:56:44 UTC
Created attachment 19159 [details]
patch that implements the multiplication idea

This generates

   if (D.1339 > 0)
      {
        if (D.1338 < D.1337)
          {
            goto L.2;
          }
        else
          {
            step_sign.2 = 1;
          }
      }
    else
      {
        if (D.1338 > D.1337)
          {
            goto L.2;
          }
        else
          {
            step_sign.2 = -1;
          }
      }
    countm1.1 = (((character(kind=4)) D.1338 - (character(kind=4)) D.1337) * (character(kind=4)) step_sign.2) / (character(kind=4)) (step_sign.2 * D.1339);

implementing the multiplication idea outlined above, and passes at
least do_3.F90.

Better?
Comment 14 kargls 2009-11-26 22:07:16 UTC
(In reply to comment #13)
>       }
>     countm1.1 = (((character(kind=4)) D.1338 - (character(kind=4)) D.1337) *
> (character(kind=4)) step_sign.2) / (character(kind=4)) (step_sign.2 * D.1339);
> 
> implementing the multiplication idea outlined above, and passes at
> least do_3.F90.
> 
> Better?

Looks much better than the current situation.  Is there a valid
reason for the character(kind=4) casts?  I would have thought
that this should be a integer(kind=4).
Comment 15 Thomas Koenig 2009-11-26 23:43:24 UTC
(In reply to comment #14)

> Looks much better than the current situation.  Is there a valid
> reason for the character(kind=4) casts?  I would have thought
> that this should be a integer(kind=4).

The casts are generated for unsigned types, which we need because
the difference between two integers can be larger than huge(integer).
Beats me why this is character instead of unsigned.
Comment 16 Tobias Burnus 2009-11-27 08:29:18 UTC
(In reply to comment #12)
> > Any tricks I have missed?
> Yes - we could provide for loop versioning in the front end.
[...]
> ELSE
>    ABORT "M3 MUST NOT BE ZERO"
> ENDIF

Just for completeness a zeroness check is done for -fcheck=do:
  Fortran runtime error: DO step value is zero
Comment 17 rguenther@suse.de 2009-11-27 09:47:12 UTC
Subject: Re:  Weird translation of DO loops

On Thu, 26 Nov 2009, tkoenig at gcc dot gnu dot org wrote:

> ------- Comment #13 from tkoenig at gcc dot gnu dot org  2009-11-26 21:56 -------
> Created an attachment (id=19159)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19159&action=view)
> patch that implements the multiplication idea
> 
> This generates
> 
>    if (D.1339 > 0)
>       {
>         if (D.1338 < D.1337)
>           {
>             goto L.2;
>           }
>         else
>           {
>             step_sign.2 = 1;
>           }
>       }
>     else
>       {
>         if (D.1338 > D.1337)
>           {
>             goto L.2;
>           }
>         else
>           {
>             step_sign.2 = -1;
>           }
>       }
>     countm1.1 = (((character(kind=4)) D.1338 - (character(kind=4)) D.1337) *
> (character(kind=4)) step_sign.2) / (character(kind=4)) (step_sign.2 * D.1339);
> 
> implementing the multiplication idea outlined above, and passes at
> least do_3.F90.
> 
> Better?

Unfortunately the conditional step_sign.2 is as bad as a conditional
countm1.1 for optimization.  The above might even be worse than
the original.

Richard.
Comment 18 rguenther@suse.de 2009-11-27 09:48:00 UTC
Subject: Re:  Weird translation of DO loops

On Thu, 26 Nov 2009, tkoenig at gcc dot gnu dot org wrote:

> ------- Comment #15 from tkoenig at gcc dot gnu dot org  2009-11-26 23:43 -------
> (In reply to comment #14)
> 
> > Looks much better than the current situation.  Is there a valid
> > reason for the character(kind=4) casts?  I would have thought
> > that this should be a integer(kind=4).
> 
> The casts are generated for unsigned types, which we need because
> the difference between two integers can be larger than huge(integer).
> Beats me why this is character instead of unsigned.

Well, in that case you can as well rely on twos-complement
arithmetic and avoid all the overflow issues?

Richard.
Comment 19 Thomas Koenig 2009-11-28 15:16:15 UTC
(In reply to comment #18)

> Well, in that case you can as well rely on twos-complement
> arithmetic and avoid all the overflow issues?

This is difficult without if statements or MAX_EXPR and MIN_EXPR,
because I need to cover both a<b and b<a.

I think I will go for calculating the difference in a larger size,
if any is available, and a special case for do loops using the largest
integer size.
Comment 20 Thomas Koenig 2009-11-30 07:31:16 UTC
Created attachment 19182 [details]
Patch that works for unrolling

It also passes do_3.F90.

I'll submit just in time for meeting the phase 4 deadline tonight (hopefully).
Comment 21 rguenther@suse.de 2009-11-30 10:10:16 UTC
Subject: Re:  Weird translation of DO loops

On Mon, 30 Nov 2009, tkoenig at gcc dot gnu dot org wrote:

> ------- Comment #20 from tkoenig at gcc dot gnu dot org  2009-11-30 07:31 -------
> Created an attachment (id=19182)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19182&action=view)
> Patch that works for unrolling
> 
> It also passes do_3.F90.
> 
> I'll submit just in time for meeting the phase 4 deadline tonight (hopefully).

If it works for unrolling it must be indeed better.  Though:

+      /* Calculate SIGN (1,step) */
+
+      tmp = fold_build2 (RSHIFT_EXPR, type, step,
+                        build_int_cst (type,
+                                       TYPE_PRECISION (type) - 1));
+
+      tmp = fold_build2 (MULT_EXPR, type, tmp,
+                        build_int_cst (type, 2));
+
+      step_sign = fold_build2 (PLUS_EXPR, type, tmp,
+                              fold_convert (type, integer_one_node));

the "sign" for unsigned steps is always 1, you don't seem to account
for unsignedness?  Note that I believe generating

  step_sign = fold_build3 (fold_build2 (LT_EXPR, boolean_type_node,
				  step, build_int_cst (TREE_TYPE (step), 
0)),
		     build_int_cst (type, -1), build_int_cst (type, 1));

is better as that allows more freedom for fold and efficient expansion
for the target.  Just double-check that it also works for the
unrolling ;)

Thanks,
Richard.
Comment 22 Thomas Koenig 2009-11-30 19:15:57 UTC
(In reply to comment #21)

> the "sign" for unsigned steps is always 1, you don't seem to account
> for unsignedness?

(Un)fortunately, there are no unsigned varaibles in Fortran.

> Note that I believe generating
> 
>   step_sign = fold_build3 (fold_build2 (LT_EXPR, boolean_type_node,
>                                   step, build_int_cst (TREE_TYPE (step), 
> 0)),
>                      build_int_cst (type, -1), build_int_cst (type, 1));
> 
> is better as that allows more freedom for fold and efficient expansion
> for the target.  Just double-check that it also works for the
> unrolling ;)

That doesn't work (something about fold_build3 needing five arguments
instead of three), so think I'll submit my patch "as is".

Thanks for your comments! 
> Thanks,
> Richard.
> 

Comment 23 Jerry DeLisle 2009-11-30 20:19:07 UTC
Thomas, Ido not have email access at the moment.

I reviewed your patch and it is approved for trunk.

Thanks for the work.
Comment 24 Thomas Koenig 2009-11-30 20:35:54 UTC
Subject: Bug 42131

Author: tkoenig
Date: Mon Nov 30 20:35:41 2009
New Revision: 154839

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=154839
Log:
2009-11-30  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/42131
	* trans-stmt.c (gfc_trans_do):  Calculate loop count
	without if statements.


Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/trans-stmt.c

Comment 25 Thomas Koenig 2009-11-30 21:01:24 UTC
Fixed on trunk. Closing.
Comment 26 rguenther@suse.de 2009-12-01 09:42:27 UTC
Subject: Re:  Weird translation of DO loops

On Mon, 30 Nov 2009, tkoenig at gcc dot gnu dot org wrote:

> 
> 
> ------- Comment #22 from tkoenig at gcc dot gnu dot org  2009-11-30 19:15 -------
> (In reply to comment #21)
> 
> > the "sign" for unsigned steps is always 1, you don't seem to account
> > for unsignedness?
> 
> (Un)fortunately, there are no unsigned varaibles in Fortran.
> 
> > Note that I believe generating
> > 
> >   step_sign = fold_build3 (fold_build2 (LT_EXPR, boolean_type_node,
> >                                   step, build_int_cst (TREE_TYPE (step), 
> > 0)),
> >                      build_int_cst (type, -1), build_int_cst (type, 1));
> > 
> > is better as that allows more freedom for fold and efficient expansion
> > for the target.  Just double-check that it also works for the
> > unrolling ;)
> 
> That doesn't work (something about fold_build3 needing five arguments
> instead of three), so think I'll submit my patch "as is".

Ah, of course ... more like

step_sign = fold_build3 (COND_EXPR, type,
                         fold_build2 (LT_EXPR, boolean_type_node,
                                   step, build_int_cst (TREE_TYPE (step), 0)),
                      build_int_cst (type, -1), build_int_cst (type, 1));

Which still would be prefered IMHO.

Richard.

> Thanks for your comments! 
> > Thanks,
> > Richard.
> > 
> 
> 
> 

Comment 27 Janne Blomqvist 2009-12-01 18:32:51 UTC
Subject: Bug 42131

Author: jb
Date: Tue Dec  1 18:32:37 2009
New Revision: 154876

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=154876
Log:
PR fortran/42131 Sign test using ternary operator

Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/trans-stmt.c

Comment 28 Janne Blomqvist 2009-12-02 09:23:05 UTC
Subject: Bug 42131

Author: jb
Date: Wed Dec  2 09:22:50 2009
New Revision: 154900

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=154900
Log:
Typo in ChangeLog entry for PR fortran/42131

Modified:
    trunk/gcc/fortran/ChangeLog

Comment 29 Dominique d'Humieres 2009-12-04 14:24:39 UTC
AFAICT the inner loop of PR42108 is still unrolled.