This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)
- From: Mikael Morin <mikael dot morin at sfr dot fr>
- To: fortran at gcc dot gnu dot org
- Cc: Jack Howarth <howarth at bromo dot med dot uc dot edu>, GCC patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 28 Oct 2011 18:30:35 +0200
- Subject: Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)
- References: <20111027232818.18581.901@gimli.local> <20111028135636.GB32273@bromo.med.uc.edu>
On Friday 28 October 2011 15:56:36 Jack Howarth wrote:
> Mikael,
> The complete patch bootstraps current FSF gcc trunk on
> x86_64-apple-darwin11 and the resulting gfortran compiler can compile the
> Polyhedron 2005 benchmarks using...
>
> Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto
> -fwhole-program %n.f90 -o %n
>
> without runtime regressions. However I don't seem to see any particular
> performance improvements with your patches applied. In fact, a few
> benchmarks including nf and test_fpu seem to show slower runtimes
> (~8-11%). Have you done any benchmarking with and without the proposed
> patches? Jack
Not myself, but the previous versions of the patch have been reported to give
sensitive improvement on "tonto" here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c26
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c35
Since those versions, the array constructor handling has been improved, and a
few mostly cosmetic changes have been applied, so I expect the posted patch to
be on par with the previous ones, possibly slightly better.
Now regarding your regressions, it is quite a lot worse, and quite unexpected.
I have just looked at test_fpu.f90 and nf.f90 from a polyhedron source I have
found at http://www.polyhedron.com/web_images/documents/pb05.zip.
There is no call to product in them, and both use only single-argument sum
calls, which are not (or shouldn't be) impacted by my patch (scalar cases).
Indeed, if I compare the code produced using -fdump-tree-original, there is
zero difference in nf.f90, and in test_fpu.f90 only slight variations which
are very very unlikely to cause the regression you see (see attached diff).
Could you double check your figures, and/or that the regressions are really
caused by my patch?
Mikael
--- test_fpu.f90.003t.original.master 2011-10-28 18:08:53.000000000 +0200
+++ test_fpu.f90.003t.original.patched 2011-10-28 18:22:28.000000000 +0200
@@ -1929,6 +1929,7 @@
D.2297 = offset.65 + -1;
atmp.64.dim[0].ubound = D.2297;
pos.61 = D.2297 >= 0 ? 1 : 0;
+ offset.62 = 1;
{
integer(kind=8) S.67;
@@ -1936,7 +1937,6 @@
while (1)
{
if (S.67 > D.2297) goto L.133;
- offset.62 = 1;
if (ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]> > limit.63)
{
limit.63 = ABS_EXPR <(*(real(kind=8)[0] * restrict) atmp.64.data)[S.67]>;
@@ -2406,14 +2406,14 @@
integer(kind=8) D.2457;
integer(kind=8) S.104;
- D.2457 = D.2436 + D.2442;
- D.2458 = stride.45;
+ D.2457 = stride.45;
+ D.2458 = D.2436 + D.2442;
D.2459 = D.2443 * stride.45 + D.2439;
S.104 = 0;
while (1)
{
if (S.104 > D.2444) goto L.149;
- (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2458 + D.2457];
+ (*(real(kind=8)[0:] * restrict) atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2457 + D.2458];
S.104 = S.104 + 1;
}
L.149:;
@@ -2486,13 +2486,13 @@
integer(kind=8) D.2479;
integer(kind=8) S.106;
- D.2479 = D.2473 + D.2476;
- D.2480 = stride.45;
+ D.2479 = stride.45;
+ D.2480 = D.2473 + D.2476;
S.106 = D.2471;
while (1)
{
if (S.106 > D.2472) goto L.152;
- (*b)[(S.106 + D.2477) * D.2480 + D.2479] = (*temp)[S.106 + -1];
+ (*b)[(S.106 + D.2477) * D.2479 + D.2480] = (*temp)[S.106 + -1];
S.106 = S.106 + 1;
}
L.152:;
@@ -2756,13 +2756,13 @@
integer(kind=8) D.2549;
integer(kind=8) S.112;
- D.2549 = D.2543 + D.2546;
- D.2550 = stride.45;
+ D.2549 = stride.45;
+ D.2550 = D.2543 + D.2546;
S.112 = 1;
while (1)
{
if (S.112 > D.2542) goto L.168;
- (*b)[(S.112 + D.2547) * D.2550 + D.2549] = (*temp)[S.112 + -1];
+ (*b)[(S.112 + D.2547) * D.2549 + D.2550] = (*temp)[S.112 + -1];
S.112 = S.112 + 1;
}
L.168:;
@@ -2885,13 +2885,13 @@
integer(kind=8) D.2582;
integer(kind=8) S.115;
- D.2582 = D.2575 + D.2579;
- D.2583 = stride.45;
+ D.2582 = stride.45;
+ D.2583 = D.2575 + D.2579;
S.115 = 1;
while (1)
{
if (S.115 > D.2578) goto L.176;
- (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2583 + D.2582];
+ (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * D.2582 + D.2583];
S.115 = S.115 + 1;
}
L.176:;
@@ -3348,6 +3348,7 @@
D.2733 = (integer(kind=8)) *n;
D.2734 = (integer(kind=8)) k;
pos.146 = D.2732 <= D.2733 ? 1 : 0;
+ offset.147 = 1 - D.2732;
{
integer(kind=8) D.2736;
integer(kind=8) S.149;
@@ -3357,7 +3358,6 @@
while (1)
{
if (S.149 > D.2733) goto L.191;
- offset.147 = 1 - D.2732;
if (ABS_EXPR <(*b)[S.149 + D.2736]> > limit.148)
{
limit.148 = ABS_EXPR <(*b)[S.149 + D.2736]>;