This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RFC] Add an early loop unrolling pass, address PRs 18754 and 34223
- From: "Richard Guenther" <richard dot guenther at gmail dot com>
- To: "Dominique Dhumieres" <dominiq at lps dot ens dot fr>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Thu, 24 Apr 2008 14:26:01 +0200
- Subject: Re: [PATCH][RFC] Add an early loop unrolling pass, address PRs 18754 and 34223
- References: <20080423150449.B8D733BE95@mailhost.lps.ens.fr>
On Wed, Apr 23, 2008 at 5:04 PM, Dominique Dhumieres <dominiq@lps.ens.fr> wrote:
> I have applied the patch on i686-apple-darwin9 (Core2Duo 2.16Ghz). For the
> polyhedron test, the gain is marginal for ac.f90:
>
> before: 12.67s, after: 12.27s
>
> almost a factor 2 for induct.f90:
>
> before: 60.94s, after: 35.84s
>
> and slightly slower (within the upper bond of the noise):
>
> capacita.f90, before: 55.05s, after: 55.42s
>
> and
>
> protein.f90, before: 46.05s, after: 46.46s.
>
> There are still some problem with my hand-optimized variants of
> induct.f90 (replacement of the dot-products by the sums of
> their nonzero products):
>
> induct_v2.f90, before: 33.54s, after: 58.41s
I see basically all dot-products unrolled, which is good, as it results in
quite some vectorization (-O3 -ffast-math, x86_64) - but unfortunately
this causes slower code (either due to bad vectorization or increased
register pressure), 38.2s vs. 43.2s for me.
There are 6 vectorized loops in induct_v2 without the patch and
29 vectorized loops with the patch.
Richard.