This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [fortran PATCH] Implement a(:,:) = 0.0 using memset

From: Tim Prince <timothyprince at sbcglobal dot net>
To: roger at eyesopen dot com
Cc: gcc-patches at gcc dot gnu dot org, fortran at gcc dot gnu dot org
Date: Mon, 18 Dec 2006 10:55:32 -0800
Subject: Re: [fortran PATCH] Implement a(:,:) = 0.0 using memset
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=sbcglobal.net; h=Received:X-YMail-OSG:Message-ID:Date:From:Reply-To:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=MIgzTiIENPNISNQxAPvktgnE14d3pnGJdNp4rttSl2y6IvuoHbFZQoSI8ImihwUaRCcezSZAj9BzekVsIRBkDCzU0mKSDaxHHGlNp1r6DBjUTaHyxsqCdxP8FJgTkohCo65Fu6CXERWXv5NR6OrdqQ3a7fmrxOIVqv/vRQtWdBI= ;
References: <4568.208.41.78.162.1166463384.squirrel@mail.eyesopen.com>
Reply-to: tprince at myrealbox dot com

roger@eyesopen.com wrote:

The following patch makes use of the recently added gfc_full_array_ref_p
function to provide the optimization of using memset when assigning an
entire array to zero.  Currently, the source code below:

  integer :: a(20)
  a(:) = 0;

we currently generate the following with -fdump-tree-original

int8 S.0;

  S.0 = 1;
  while (1)
    {
      if (S.0 > 20) goto L.1; else (void) 0;
      (*a)[NON_LVALUE_EXPR <S.0> + -1] = 0;
      S.0 = S.0 + 1;
    }
  L.1:;

with the patch below, we now generate this instead.

(void) __builtin_memset ((void *) a, 0, 80);


This can then take advantage of GCC's intrinsic expansion machinery,
including Jan's recent improvements for x86.  I'm keen to hear if there
are any corner cases that I've overlooked and aren't covered by the
gfortran
testsuite.  Perhaps if someone could run NIST, polyhedron and the usual
suspects to confirm there are no issues.

Once this is in the tree, and there are no major issues, there are some
obvious extensions and improvements that can be made a follow-up patches:
[1] Avoid using memset for small array sizes, such that the tree-ssa
optimizers would unroll the loop and reveal the assignments via SRA.
[2] Allow reverse order initialization, such as a(20:1:-1) = 0.
[3] Extend the infrastructure to support sequentially consecutive
assignments that don't cover the entire array a(20:40) = 0.0.
[4] Extend infrastructure for arbitrary (run-time) length expressions,
such as a(1:n) = 0.0.
[5] Generalize this optimization to use memcpy (or memmove?) for array
assignments, a(:) = b(:).

What can builtin_memset() do better than vectorized code, on platforms which matter to you? Non-temporal store would be advantageous when the array is larger than cache size, or will never be referenced while still in cache, but will builtin_memset() be able to make such decisions efficiently, if at all?

References:
- [fortran PATCH] Implement a(:,:) = 0.0 using memset
  - From: roger

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]