This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[fortran PATCH] Implement a(:,:) = 0.0 using memset (take 2)


As pointed out by Steve Kargl, some of the tests in my recent patch were
overly conservative, and only handled pointers to arrays, rather than
local arrays.  The revised patch below addresses this limitation/oversight
and allows us to use __builtin_memset in more cases.

As an example, consider the following test case that's reduced from
polyhedron's fatigue benchmark.

module fatigue
integer, parameter :: LONGreal = selected_real_kind(15,90)

contains
function generalized_hookes_law (mu) result (stress_tensor)
  real (kind = LONGreal), intent(in) :: mu
  real (kind = LONGreal), dimension(3,3) :: stress_tensor
  real (kind = LONGreal), dimension(6,6) :: generalized_constitutive_tensor

  generalized_constitutive_tensor(:,:) = 0.0_LONGreal
end function

end fatigue

The previous test's of POINTER_TYPE_P type in gfc_trans_zero_assign
didn't allow for local arrays, where type is already ARRAY_TYPE with
GFC_ARRAY_TYPE_P set.  We can handle those, but must take their address
before calling __builtin_memset.


Previously, for the code above we'd generate:

generalized_hookes_law (__result, mu)
{
  real8 generalized_constitutive_tensor[36];

  {
    int8 S.0;

    S.0 = 1;
    while (1)
      {
        if (S.0 > 6) goto L.2; else (void) 0;
        {
          int8 D.1344;
          int8 S.1;

          D.1344 = NON_LVALUE_EXPR <S.0> * 6 + -7;
          S.1 = 1;
          while (1)
            {
              if (S.1 > 6) goto L.1; else (void) 0;
              generalized_constitutive_tensor[S.1 + D.1344] = 0.0;
              S.1 = S.1 + 1;
            }
          L.1:;
        }
        S.0 = S.0 + 1;
      }
    L.2:;
  }
}

which is a doubly nested loop which should stress Andrew Pinski/SuSE's
proposed memset identification patch.  Instead, we save compile-time and
identify this in the front-end and generate:

generalized_hookes_law (__result, mu)
{
  real8 generalized_constitutive_tensor[36];

  (void) __builtin_memset ((void *) &generalized_constitutive_tensor, 0,
288);
}


The use of __builtin_memset, allows the compiler to generate the most
efficient idiom for clearing a block of memory.  It tuens out that the
above assignment is in a critical function of fatigue, and optimizing
it using memset shows an observable speed-up on x86_64-unknown-linux-gnu.

Time Before:  21.76s  21.67s  21.73s
Time After:   20.30s  20.03s  20.04s

which is approximately a 7% performance improvement.

The following revised patch has been tested on x86_64-unknown-linux-gnu
with a full "make bootstrap", including gfortran and regression tested
with a top-level "make -k check" with no new failures.

Ok for mainline?

2006-12-18  Roger Sayle  <roger@eyesopen.com>

        * trans-expr.c (is_zero_initializer_p): Determine whether a given
        constant expression is a zero initializer.
        (gfc_trans_zero_assign): New function to attempt top optimize
        "a(:) = 0.0" as a call to __builtin_memset (a, 0, sizeof(a));
        (gfc_trans_assignment): Special case array assignments to zero
        initializer constants, using gfc_trans_zero_assign.

        * gfortran.dg/array_memset_1.f90: New test case.


Roger
--

Attachment: patche2.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]