This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][RFC] Add an early loop unrolling pass, address PRs 18754 and 34223

From: "Richard Guenther" <richard dot guenther at gmail dot com>
To: "Dominique Dhumieres" <dominiq at lps dot ens dot fr>
Cc: gcc-patches at gcc dot gnu dot org
Date: Thu, 24 Apr 2008 14:26:01 +0200
Subject: Re: [PATCH][RFC] Add an early loop unrolling pass, address PRs 18754 and 34223
References: <20080423150449.B8D733BE95@mailhost.lps.ens.fr>

On Wed, Apr 23, 2008 at 5:04 PM, Dominique Dhumieres <dominiq@lps.ens.fr> wrote:
> I have applied the patch on i686-apple-darwin9 (Core2Duo 2.16Ghz). For the
>  polyhedron test, the gain is marginal for ac.f90:
>
>  before: 12.67s, after: 12.27s
>
>  almost a factor 2 for induct.f90:
>
>  before: 60.94s, after: 35.84s
>
>  and slightly slower (within the upper bond of the noise):
>
>  capacita.f90, before: 55.05s, after: 55.42s
>
>  and
>
>  protein.f90, before: 46.05s, after: 46.46s.
>
>  There are still some problem with my hand-optimized variants of
>  induct.f90 (replacement of the dot-products by the sums of
>  their nonzero products):
>
>  induct_v2.f90, before: 33.54s, after: 58.41s

I see basically all dot-products unrolled, which is good, as it results in
quite some vectorization (-O3 -ffast-math, x86_64) - but unfortunately
this causes slower code (either due to bad vectorization or increased
register pressure), 38.2s vs. 43.2s for me.

There are 6 vectorized loops in induct_v2 without the patch and
29 vectorized loops with the patch.

Richard.

References:
- Re: [PATCH][RFC] Add an early loop unrolling pass, address PRs 18754 and 34223
  - From: Dominique Dhumieres

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]