Split out from PR42108. The loop is not unrolled because the frontend presents us with very funny obfuscated code: do k=i,nnd,n temp=temp+(x(k)-x(k+jmini))**2 end do gets translated to { character(kind=4) countm1.6; integer(kind=4) D.1551; integer(kind=4) D.1550; integer(kind=4) D.1549; D.1549 = i; D.1550 = *nnd; D.1551 = *n; k = D.1549; if (D.1551 > 0) { if (D.1550 < D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1550 - D.1549) / (character(kind=4)) D.1551;; } else { if (D.1550 > D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1549 - D.1550) / (character(kind=4)) -D.1551;; } while (1) { { real(kind=8) D.1556; real(kind=8) D.1555; D.1555 = (((*x)[(integer(kind=8)) k + -1] - (*x)[(integer(kind=8)) (k + jmini) + -1])); D.1556 = D.1555 * D.1555; temp = temp + D.1556; } L.5:; k = k + D.1551; if (countm1.6 == 0) goto L.6; countm1.6 = countm1.6 + 4294967295; } L.6:; } The funny conditional initialization of countm1.6 makes the analysis of the number of iterations of this loop impossible (not to mention the conversions to character(kind=4)). Toon suggests: The Standard doesn't prescribe the code the Frontend generates - however, to be sure one follows the Standard, it's most easy to simply implement the steps given. To illustrate this with a simple example: DO I = M1, M2, M3 B(I) = A(I) ENDDO would be most easily, and atraightforwardly, implemented as follows: IF (M3 > 0 .AND. M1 < M2) GOTO 200 ! Loop executed zero times IF (M3 < 0 .AND. M1 > M2) GOTO 200 ! Ditto ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop count I = M1 100 CONTINUE B(I) = A(I) ITEMP = ITEMP - 1 ! Adjust internal loop counter I = I + M3 ! Adjust DO loop variable IF (ITEMP > 0) GOTO 100 200 CONTINUE That there are two induction variables in this loop is inconsequential - one of them should be eliminated by induction variable elimination (at least, that was the case with g77 and the RTL loop optimization pass). which I would agree with. I btw cannot see the difference between if (D.1551 > 0) { if (D.1550 < D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1550 - D.1549) / (character(kind=4)) D.1551;; } else { if (D.1550 > D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1549 - D.1550) / (character(kind=4)) -D.1551;; } and if ((D.1551 > 0 && D.1550 < D.1549) || (D.1551 < 0 && D.1550 > D.1549)) goto L.6; countm1.6 = (character(kind=4)) (D.1550 - D.1549) / (character(kind=4)) D.1551; where the unconditional initialization of countm1.6 is the important difference (I'm sure the zero-trip-count check can be done more efficiently).
(In reply to comment #0) > To illustrate this with a simple example: > > DO I = M1, M2, M3 > B(I) = A(I) > ENDDO > > would be most easily, and atraightforwardly, implemented as follows: > > IF (M3 > 0 .AND. M1 < M2) GOTO 200 ! Loop executed zero times > IF (M3 < 0 .AND. M1 > M2) GOTO 200 ! Ditto First, I just woke up 10 minutes ago and still have 1/2 of cup of coffee, but the logic above looks wrong. do i = 1, 2, 1 print *, i end do should produce 1 2 Here, we have m1 = 1, m2 = 2, m3 = 1 > IF (M3 > 0 .AND. M1 < M2) GOTO 200 ! Loop executed zero times
Sorry, Steve - my mistake. The original message should have been: To illustrate this with a simple example: DO I = M1, M2, M3 B(I) = A(I) ENDDO would be most easily, and straightforwardly, implemented as follows: IF (M3 > 0 .AND. M1 > M2) GOTO 200 ! Loop executed zero times IF (M3 < 0 .AND. M1 < M2) GOTO 200 ! Ditto ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop counter I = M1 100 CONTINUE B(I) = A(I) ITEMP = ITEMP - 1 ! Adjust internal loop counter I = I + M3 ! Adjust DO loop variable IF (ITEMP >= 0) GOTO 100 200 CONTINUE I hope this makes clear what's weird about the way the Fortran Frontend does it now. The example code follows the Standard as close as possible (it only doesn't check that m3 isn't zero, which isn't allowed), except that I follow Note 8.7 instead of the reasoning in the "sequence of steps".
(In reply to comment #2) > Sorry, Steve - my mistake. > > The original message should have been: > > To illustrate this with a simple example: > > DO I = M1, M2, M3 > B(I) = A(I) > ENDDO > > would be most easily, and straightforwardly, implemented as follows: > > IF (M3 > 0 .AND. M1 > M2) GOTO 200 ! Loop executed zero times > IF (M3 < 0 .AND. M1 < M2) GOTO 200 ! Ditto > ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop counter > I = M1 > 100 CONTINUE > B(I) = A(I) > ITEMP = ITEMP - 1 ! Adjust internal loop counter > I = I + M3 ! Adjust DO loop variable > IF (ITEMP >= 0) GOTO 100 > 200 CONTINUE As written, this executes the assignment one time too many. The last but one line should read: IF (ITEMP > 0) GOTO 100 What we generate has the test at the bottom of the loop, whereas 8.1.6.4.2 puts the test of the iteration count first. What we could generate, IMHO, is read (*,*) m1, m2, m3 ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop counter print *,'itemp=',itemp I = M1 100 CONTINUE IF (ITEMP <= 0) GOTO 200 print *,i I = I + M3 ! Adjust DO loop variable ITEMP = ITEMP - 1 ! Adjust internal loop counter GOTO 100 200 CONTINUE print *,"final : i = ", i end Does this have any drawbacks? Would the middle end recognize this for vectorization and loop unrolling?
The middle-end prefers do { } while () loop style so it knows the loop is always executed. It even tries to transform other loop forms into this by copying the loop header. So if the FE already can cheaply produce this it would be prefered.
> The middle-end prefers do { } while () loop style so it knows the loop is > always executed. And the Fortran Standard describes the loops being built (by compilers) just so: 1. First you determine what is m1, m2, m3 2. Then you initialize the do loop counter with m1 3. Then you determine the loop count (m2 - m1 + m3) / m3 4. If the loop count is zero or negative, you skip the loop 5. Otherwise, you traverse the loop body at least once. Or, to return to our example (and fix the mistake Thomas Koenig noted): DO I = M1, M2, M3 B(I) = A(I) ENDDO turns into ITEMP = (M2 - M1 + M3) / M3 ! The iteration count IF (ITEMP <= 0) GOTO 200 ! Past this point, the loop is executed I = M1 ! at least once. 100 CONTINUE B(I) = A(I) ITEMP = ITEMP - 1 I = I + M3 IF (ITEMP > 0) GOTO 100 200 CONTINUE therewith following the (normative) text of the Standard.
Created attachment 19076 [details] proposed patch This patch generates D.1336 = m1; D.1337 = m2; D.1338 = m3; i = D.1336; if (D.1338 > 0) { if (D.1337 < D.1336) goto L.2; } else { if (D.1337 > D.1336) goto L.2; } countm1.1 = (character(kind=4)) (D.1337 - D.1336) / (character(kind=4)) D.1338; while (1) { { Is this better, or did I overlook anything?
Subject: Re: Weird translation of DO loops On Sat, 21 Nov 2009, tkoenig at gcc dot gnu dot org wrote: > ------- Comment #6 from tkoenig at gcc dot gnu dot org 2009-11-21 23:07 ------- > Created an attachment (id=19076) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19076&action=view) > proposed patch > > This patch generates > > D.1336 = m1; > D.1337 = m2; > D.1338 = m3; > i = D.1336; > if (D.1338 > 0) > { > if (D.1337 < D.1336) goto L.2; > } > else > { > if (D.1337 > D.1336) goto L.2; > } > countm1.1 = (character(kind=4)) (D.1337 - D.1336) / (character(kind=4)) > D.1338; > while (1) > { > { > > Is this better, or did I overlook anything? That's better. Richard.
Subject: Re: Weird translation of DO loops On Sat, 2009-11-21 at 23:23 +0000, rguenther at suse dot de wrote: > That's better. Not yet correct, though, this causes regressions for program main character*9 line line = ' 10 1 -3' read (unit=line,fmt='(3I3)') m1, m2, m3 print *,m1,m2,m3 do i=m1, m2, m3 print *,i end do end program main I'll have to look some more.
Richard wondered about this earlier: > countm1.1 = (character(kind=4)) (D.1337 - D.1336) / (character(kind=4)) D.1338; but perhaps it's better to asked explicitly: Where does the "(character(kind=4))" comes from in this (obvious) integer computation ?
"Do loop with HUGE stepsize": http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/63348a0461ccf3e0 > Current status is that g77 prints 0 and gfortran prints 1. To add: NAG f95, g95, sunf95, and ifort print "1" while open64 and pathf95 print "0".
Created attachment 19104 [details] another proposed patch Here's another proposed patch, but there is a problem with it. If we calculate (m2 - m1 + m3)/m3 with signed types, then this will overflow, for example, for do i=-huge(i),huge(i),2 The current implementation gets around that by doing the addition and division unsigned, then dividing unsigned as well. For this, there have to be two cases, one for m3>0 and one for m3<0, which is what we generate at the moment. Possible solutions: - get rid of the loop counter altogether - perform the intermediate calculation with increased precision, if available - Multiply everything by the sign of m3 before doing the unsigned math (which is what we do now, except that we hide this behind an if) - Trust two's complement arithmetic and hope that overflows don't trap (not, in general, a good idea) Any tricks I have missed?
> Any tricks I have missed? Yes - we could provide for loop versioning in the front end. I.e., generate code like: IF (M3 > 0) THEN ... compute loop count ... ... perform loop ... ELSE IF (M3 < 0) THEN ... compute loop count ... ... perform loop ... ELSE ABORT "M3 MUST NOT BE ZERO" ENDIF And then hope that Value Range Propagation and InterProcedural Analysis will throw away 2 of the 3 branches of this loop because it is able to determine the sign of M3. <evil grin>
Created attachment 19159 [details] patch that implements the multiplication idea This generates if (D.1339 > 0) { if (D.1338 < D.1337) { goto L.2; } else { step_sign.2 = 1; } } else { if (D.1338 > D.1337) { goto L.2; } else { step_sign.2 = -1; } } countm1.1 = (((character(kind=4)) D.1338 - (character(kind=4)) D.1337) * (character(kind=4)) step_sign.2) / (character(kind=4)) (step_sign.2 * D.1339); implementing the multiplication idea outlined above, and passes at least do_3.F90. Better?
(In reply to comment #13) > } > countm1.1 = (((character(kind=4)) D.1338 - (character(kind=4)) D.1337) * > (character(kind=4)) step_sign.2) / (character(kind=4)) (step_sign.2 * D.1339); > > implementing the multiplication idea outlined above, and passes at > least do_3.F90. > > Better? Looks much better than the current situation. Is there a valid reason for the character(kind=4) casts? I would have thought that this should be a integer(kind=4).
(In reply to comment #14) > Looks much better than the current situation. Is there a valid > reason for the character(kind=4) casts? I would have thought > that this should be a integer(kind=4). The casts are generated for unsigned types, which we need because the difference between two integers can be larger than huge(integer). Beats me why this is character instead of unsigned.
(In reply to comment #12) > > Any tricks I have missed? > Yes - we could provide for loop versioning in the front end. [...] > ELSE > ABORT "M3 MUST NOT BE ZERO" > ENDIF Just for completeness a zeroness check is done for -fcheck=do: Fortran runtime error: DO step value is zero
Subject: Re: Weird translation of DO loops On Thu, 26 Nov 2009, tkoenig at gcc dot gnu dot org wrote: > ------- Comment #13 from tkoenig at gcc dot gnu dot org 2009-11-26 21:56 ------- > Created an attachment (id=19159) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19159&action=view) > patch that implements the multiplication idea > > This generates > > if (D.1339 > 0) > { > if (D.1338 < D.1337) > { > goto L.2; > } > else > { > step_sign.2 = 1; > } > } > else > { > if (D.1338 > D.1337) > { > goto L.2; > } > else > { > step_sign.2 = -1; > } > } > countm1.1 = (((character(kind=4)) D.1338 - (character(kind=4)) D.1337) * > (character(kind=4)) step_sign.2) / (character(kind=4)) (step_sign.2 * D.1339); > > implementing the multiplication idea outlined above, and passes at > least do_3.F90. > > Better? Unfortunately the conditional step_sign.2 is as bad as a conditional countm1.1 for optimization. The above might even be worse than the original. Richard.
Subject: Re: Weird translation of DO loops On Thu, 26 Nov 2009, tkoenig at gcc dot gnu dot org wrote: > ------- Comment #15 from tkoenig at gcc dot gnu dot org 2009-11-26 23:43 ------- > (In reply to comment #14) > > > Looks much better than the current situation. Is there a valid > > reason for the character(kind=4) casts? I would have thought > > that this should be a integer(kind=4). > > The casts are generated for unsigned types, which we need because > the difference between two integers can be larger than huge(integer). > Beats me why this is character instead of unsigned. Well, in that case you can as well rely on twos-complement arithmetic and avoid all the overflow issues? Richard.
(In reply to comment #18) > Well, in that case you can as well rely on twos-complement > arithmetic and avoid all the overflow issues? This is difficult without if statements or MAX_EXPR and MIN_EXPR, because I need to cover both a<b and b<a. I think I will go for calculating the difference in a larger size, if any is available, and a special case for do loops using the largest integer size.
Created attachment 19182 [details] Patch that works for unrolling It also passes do_3.F90. I'll submit just in time for meeting the phase 4 deadline tonight (hopefully).
Subject: Re: Weird translation of DO loops On Mon, 30 Nov 2009, tkoenig at gcc dot gnu dot org wrote: > ------- Comment #20 from tkoenig at gcc dot gnu dot org 2009-11-30 07:31 ------- > Created an attachment (id=19182) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19182&action=view) > Patch that works for unrolling > > It also passes do_3.F90. > > I'll submit just in time for meeting the phase 4 deadline tonight (hopefully). If it works for unrolling it must be indeed better. Though: + /* Calculate SIGN (1,step) */ + + tmp = fold_build2 (RSHIFT_EXPR, type, step, + build_int_cst (type, + TYPE_PRECISION (type) - 1)); + + tmp = fold_build2 (MULT_EXPR, type, tmp, + build_int_cst (type, 2)); + + step_sign = fold_build2 (PLUS_EXPR, type, tmp, + fold_convert (type, integer_one_node)); the "sign" for unsigned steps is always 1, you don't seem to account for unsignedness? Note that I believe generating step_sign = fold_build3 (fold_build2 (LT_EXPR, boolean_type_node, step, build_int_cst (TREE_TYPE (step), 0)), build_int_cst (type, -1), build_int_cst (type, 1)); is better as that allows more freedom for fold and efficient expansion for the target. Just double-check that it also works for the unrolling ;) Thanks, Richard.
(In reply to comment #21) > the "sign" for unsigned steps is always 1, you don't seem to account > for unsignedness? (Un)fortunately, there are no unsigned varaibles in Fortran. > Note that I believe generating > > step_sign = fold_build3 (fold_build2 (LT_EXPR, boolean_type_node, > step, build_int_cst (TREE_TYPE (step), > 0)), > build_int_cst (type, -1), build_int_cst (type, 1)); > > is better as that allows more freedom for fold and efficient expansion > for the target. Just double-check that it also works for the > unrolling ;) That doesn't work (something about fold_build3 needing five arguments instead of three), so think I'll submit my patch "as is". Thanks for your comments! > Thanks, > Richard. >
Thomas, Ido not have email access at the moment. I reviewed your patch and it is approved for trunk. Thanks for the work.
Subject: Bug 42131 Author: tkoenig Date: Mon Nov 30 20:35:41 2009 New Revision: 154839 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=154839 Log: 2009-11-30 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/42131 * trans-stmt.c (gfc_trans_do): Calculate loop count without if statements. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/trans-stmt.c
Fixed on trunk. Closing.
Subject: Re: Weird translation of DO loops On Mon, 30 Nov 2009, tkoenig at gcc dot gnu dot org wrote: > > > ------- Comment #22 from tkoenig at gcc dot gnu dot org 2009-11-30 19:15 ------- > (In reply to comment #21) > > > the "sign" for unsigned steps is always 1, you don't seem to account > > for unsignedness? > > (Un)fortunately, there are no unsigned varaibles in Fortran. > > > Note that I believe generating > > > > step_sign = fold_build3 (fold_build2 (LT_EXPR, boolean_type_node, > > step, build_int_cst (TREE_TYPE (step), > > 0)), > > build_int_cst (type, -1), build_int_cst (type, 1)); > > > > is better as that allows more freedom for fold and efficient expansion > > for the target. Just double-check that it also works for the > > unrolling ;) > > That doesn't work (something about fold_build3 needing five arguments > instead of three), so think I'll submit my patch "as is". Ah, of course ... more like step_sign = fold_build3 (COND_EXPR, type, fold_build2 (LT_EXPR, boolean_type_node, step, build_int_cst (TREE_TYPE (step), 0)), build_int_cst (type, -1), build_int_cst (type, 1)); Which still would be prefered IMHO. Richard. > Thanks for your comments! > > Thanks, > > Richard. > > > > >
Subject: Bug 42131 Author: jb Date: Tue Dec 1 18:32:37 2009 New Revision: 154876 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=154876 Log: PR fortran/42131 Sign test using ternary operator Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/trans-stmt.c
Subject: Bug 42131 Author: jb Date: Wed Dec 2 09:22:50 2009 New Revision: 154900 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=154900 Log: Typo in ChangeLog entry for PR fortran/42131 Modified: trunk/gcc/fortran/ChangeLog
AFAICT the inner loop of PR42108 is still unrolled.