Bug 24520 - Temporary constant array descriptors being declared at wrong binding level.
Summary: Temporary constant array descriptors being declared at wrong binding level.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.1.0
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2005-10-25 13:37 UTC by Paul Thomas
Modified: 2006-08-29 16:29 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-10-25 14:39:19


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Thomas 2005-10-25 13:37:22 UTC
Posted in: http://gcc.gnu.org/ml/fortran/2005-10/msg00443.html

I have been investigating the relatively poor performance of gfortran for
some of the Polyhedron Benchmark Tests (www.polyhedron.com).

I already discussed a couple of days ago how test_fpu.f90 exposed some
weakness in the dependency analysis. I am developing a patch will do
somewhat more than the "draft patch" discussed there.

As posted on the Wiki (http://gcc.gnu.org/wiki/GFortranResults), two real
offenders are induct.f90 and kepler.f90 (I have confirmed this in an ifc/gfc
comparison that I will post tonight or tomorrow.).  As mentioned there,
profiling indicates that the intrinsic dot_product is taking >50% of the
time.  Subsequently I have confirmed this by the simple expedient of adding
a repeat copy of the section of code that calls dot_product.  The difference
is of the same order as the difference between gfc and DF6.0 execution
times.

It turns out that gfc is slow because it is making temporary array
descriptors for the actual arguments of dot_product.  Since these are only
of length 13, the temporary making slugs down gfc a lot.  This can be
confirmed as follows:

real, dimension(12) :: x, y
real                :: z
do i = 1, 10000000
  z = dot_product(x,y)
end do
end

takes  0.15s under DF6.0 and 45.5s for gfc!

When rewritten as

real, dimension(:), pointer :: x, y
real                :: z
allocate (x(12), y(12))
do i = 1, 10000000
  z = dot_product (x,y)
end do
end

the time increases slightly for DF6.0, to 0.27s.  gfc now comes in with a
creditable 0.39s.

The code within the loop for both versions appears below.  Apparently the
allocation of the descriptor structures and the assignments to them cause
the enormous slow-down.

I think that the lesson is that constant array references need to be taken
out of loops or their use should automatically generate a pointer.  I rather
like the latter because I suspect it to be more easily implementable.

Paul Thomas


Non_pointer version


  if (i <= 10000000)
    {
      while (1)
        {
          {
            logical4 D.573;

            {
              struct array1_real4 parm.1;
              struct array1_real4 parm.0;

              parm.0.dtype = 281;
              parm.0.dim[0].lbound = 1;
              parm.0.dim[0].ubound = 12;
              parm.0.dim[0].stride = 1;
              parm.0.data = (void *) (real4[0:] *) &x[0];
              parm.0.offset = 0;
              parm.1.dtype = 281;
              parm.1.dim[0].lbound = 1;
              parm.1.dim[0].ubound = 12;
              parm.1.dim[0].stride = 1;
              parm.1.data = (void *) (real4[0:] *) &y[0];
              parm.1.offset = 0;
              z = _gfortran_dot_product_r4 (&parm.0, &parm.1);
            }
            L.1:;
            D.573 = i == 10000000;
            i = i + 1;
            if (D.573) goto L.2; else (void) 0;
          }
        }
    }
  else
    {
      (void) 0;
    }
  L.2:;

and for the pointer version

  if (i <= 10000000)
    {
      while (1)
        {
          {
            logical4 D.573;

            z = _gfortran_dot_product_r4 (&x, &y);
            L.1:;
            D.573 = i == 10000000;
            i = i + 1;
            if (D.573) goto L.2; else (void) 0;
          }
        }
    }
  else
    {
      (void) 0;
    }
  L.2:;
Comment 1 Andrew Pinski 2005-10-25 14:39:19 UTC
Confirmed  (note IE sucks).
Comment 2 Andrew Pinski 2006-01-19 16:11:49 UTC
I wonder if we could get the aliasing mechanism to say that this array descriptor is not changed and move the stores out of the loop.
Comment 3 Paul Thomas 2006-01-20 07:46:31 UTC
Subject: RE:  Temporary constant array descriptors being declared at wrong binding level.

Andrew,

It turns out that the real overhead is the function call.  I posted a patch to inline DOT_PRODUCT, which performs better than the library version.  It is on my list of things to do that I resubmit this patch - it needs a changeover from inline to library at a vector length ~16-32.  I need to study this as a function of platform and arrya reference types.

Best regards

Paul

> -----Message d'origine-----
> De : pinskia at gcc dot gnu dot org [mailto:gcc-bugzilla@gcc.gnu.org]
> Envoyé : jeudi 19 janvier 2006 17:12
> À : THOMAS Paul Richard 169137
> Objet : [Bug fortran/24520] Temporary constant array descriptors being
> declared at wrong binding level.
> 
> 
> 
> 
> ------- Comment #2 from pinskia at gcc dot gnu dot org  
> 2006-01-19 16:11 -------
> I wonder if we could get the aliasing mechanism to say that this array
> descriptor is not changed and move the stores out of the loop.
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24520
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
> 
Comment 4 Andrew Pinski 2006-03-15 15:38:10 UTC
You can expose the bug now with:
real, dimension(12) :: x, y
real                :: z
do i = 1, 10000000
  z = g(x,y)
end do
print *, x
contains
function g(x, y)
real, dimension(:) :: x, y
real g
x = x + y
end function
end
Comment 5 Paul Thomas 2006-03-16 08:38:27 UTC
Yes, it is not quite as spectacular as before but present nonetheless.  By comparing pointer and non-pointer cases, I measure an overhead of 12 +/- 7 ns on a 2.4Ghz PIV.  I have no idea why the error is so large but it bobs around, according to the size of the array; eg. for array size N = 1, it is 19ns, for N = 16 is 16ns, whilst n = 4 is only hit for 6ns.

In preparing the array TRANSFER intrinsic, I have learned more about parameter passing than I like to think about. *sigh*  I think it might be an easy matter to promote the case of a constant descriptor up to the procedure scope.  I t has been pushed onto the TODO stack.

Paul 
Comment 6 Paul Thomas 2006-08-29 16:29:31 UTC
All the issues with dot product have been sorted, as far as I know.

Paul