This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)

From: Mikael Morin <mikael dot morin at sfr dot fr>
To: fortran at gcc dot gnu dot org
Cc: Jack Howarth <howarth at bromo dot med dot uc dot edu>, GCC patches <gcc-patches at gcc dot gnu dot org>
Date: Fri, 28 Oct 2011 18:30:35 +0200
Subject: Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)
References: <20111027232818.18581.901@gimli.local> <20111028135636.GB32273@bromo.med.uc.edu>

On Friday 28 October 2011 15:56:36 Jack Howarth wrote:
> Mikael,
>     The complete patch bootstraps current FSF gcc trunk on
> x86_64-apple-darwin11 and the resulting gfortran compiler can compile the
> Polyhedron 2005 benchmarks using...
> 
> Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto
> -fwhole-program %n.f90 -o %n
> 
> without runtime regressions. However I don't seem to see any particular
> performance improvements with your patches applied. In fact, a few
> benchmarks including nf and test_fpu seem to show slower runtimes
> (~8-11%). Have you done any benchmarking with and without the proposed
> patches? Jack

Not myself, but the previous versions of the patch have been reported to give 
sensitive improvement on "tonto" here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c26
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c35

Since those versions, the array constructor handling has been improved, and a 
few mostly cosmetic changes have been applied, so I expect the posted patch to 
be on par with the previous ones, possibly slightly better.

Now regarding your regressions, it is quite a lot worse, and quite unexpected.
I have just looked at test_fpu.f90 and nf.f90 from a polyhedron source I have 
found at http://www.polyhedron.com/web_images/documents/pb05.zip. 
There is no call to product in them, and both use only single-argument sum 
calls, which are not (or shouldn't be) impacted by my patch (scalar cases). 
Indeed, if I compare the code produced using -fdump-tree-original, there is 
zero difference in nf.f90, and in test_fpu.f90 only slight variations which 
are very very unlikely to cause the regression you see (see attached diff).

Could you double check your figures, and/or that the regressions are really 
caused by my patch?

Mikael

--- test_fpu.f90.003t.original.master	2011-10-28 18:08:53.000000000 +0200
+++ test_fpu.f90.003t.original.patched	2011-10-28 18:22:28.000000000 +0200
@@ -1929,6 +1929,7 @@
                       D.2297 = offset.65 + -1;
                       atmp.64.dim[0].ubound = D.2297;
                       pos.61 = D.2297 >= 0 ? 1 : 0;
+                      offset.62 = 1;
                       {
                         integer(kind=8) S.67;
 
@@ -1936,7 +1937,6 @@
                         while (1)
                           {
                             if (S.67 > D.2297) goto L.133;
-                            offset.62 = 1;
                             if (ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]> > limit.63)
                               {
                                 limit.63 = ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]>;
@@ -2406,14 +2406,14 @@
                           integer(kind=8) D.2457;
                           integer(kind=8) S.104;
 
-                          D.2457 = D.2436 + D.2442;
-                          D.2458 = stride.45;
+                          D.2457 = stride.45;
+                          D.2458 = D.2436 + D.2442;
                           D.2459 = D.2443 * stride.45 + D.2439;
                           S.104 = 0;
                           while (1)
                             {
                               if (S.104 > D.2444) goto L.149;
-                              (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2458 + D.2457];
+                              (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2457 + D.2458];
                               S.104 = S.104 + 1;
                             }
                           L.149:;
@@ -2486,13 +2486,13 @@
                           integer(kind=8) D.2479;
                           integer(kind=8) S.106;
 
-                          D.2479 = D.2473 + D.2476;
-                          D.2480 = stride.45;
+                          D.2479 = stride.45;
+                          D.2480 = D.2473 + D.2476;
                           S.106 = D.2471;
                           while (1)
                             {
                               if (S.106 > D.2472) goto L.152;
-                              (*b)[(S.106 + D.2477) * D.2480 + D.2479] = (*temp)[S.106 + -1];
+                              (*b)[(S.106 + D.2477) * D.2479 + D.2480] = (*temp)[S.106 + -1];
                               S.106 = S.106 + 1;
                             }
                           L.152:;
@@ -2756,13 +2756,13 @@
                       integer(kind=8) D.2549;
                       integer(kind=8) S.112;
 
-                      D.2549 = D.2543 + D.2546;
-                      D.2550 = stride.45;
+                      D.2549 = stride.45;
+                      D.2550 = D.2543 + D.2546;
                       S.112 = 1;
                       while (1)
                         {
                           if (S.112 > D.2542) goto L.168;
-                          (*b)[(S.112 + D.2547) * D.2550 + D.2549] = (*temp)[S.112 + -1];
+                          (*b)[(S.112 + D.2547) * D.2549 + D.2550] = (*temp)[S.112 + -1];
                           S.112 = S.112 + 1;
                         }
                       L.168:;
@@ -2885,13 +2885,13 @@
                       integer(kind=8) D.2582;
                       integer(kind=8) S.115;
 
-                      D.2582 = D.2575 + D.2579;
-                      D.2583 = stride.45;
+                      D.2582 = stride.45;
+                      D.2583 = D.2575 + D.2579;
                       S.115 = 1;
                       while (1)
                         {
                           if (S.115 > D.2578) goto L.176;
-                          (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2583 + D.2582];
+                          (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2582 + D.2583];
                           S.115 = S.115 + 1;
                         }
                       L.176:;
@@ -3348,6 +3348,7 @@
                       D.2733 = (integer(kind=8)) *n;
                       D.2734 = (integer(kind=8)) k;
                       pos.146 = D.2732 <= D.2733 ? 1 : 0;
+                      offset.147 = 1 - D.2732;
                       {
                         integer(kind=8) D.2736;
                         integer(kind=8) S.149;
@@ -3357,7 +3358,6 @@
                         while (1)
                           {
                             if (S.149 > D.2733) goto L.191;
-                            offset.147 = 1 - D.2732;
                             if (ABS_EXPR <(*b)[S.149 + D.2736]> > limit.148)
                               {
                                 limit.148 = ABS_EXPR <(*b)[S.149 + D.2736]>;

Follow-Ups:
- Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and?product (AKA scalarization of reductions)
  - From: Jack Howarth

References:
- [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)
  - From: Mikael Morin
- Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)
  - From: Jack Howarth

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]