This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Add an early loop unrolling pass, address PRs 18754 and 34223


On Wed, Apr 23, 2008 at 5:04 PM, Dominique Dhumieres <dominiq@lps.ens.fr> wrote:
> I have applied the patch on i686-apple-darwin9 (Core2Duo 2.16Ghz). For the
>  polyhedron test, the gain is marginal for ac.f90:
>
>  before: 12.67s, after: 12.27s
>
>  almost a factor 2 for induct.f90:
>
>  before: 60.94s, after: 35.84s
>
>  and slightly slower (within the upper bond of the noise):
>
>  capacita.f90, before: 55.05s, after: 55.42s
>
>  and
>
>  protein.f90, before: 46.05s, after: 46.46s.
>
>  There are still some problem with my hand-optimized variants of
>  induct.f90 (replacement of the dot-products by the sums of
>  their nonzero products):
>
>  induct_v2.f90, before: 33.54s, after: 58.41s

I see basically all dot-products unrolled, which is good, as it results in
quite some vectorization (-O3 -ffast-math, x86_64) - but unfortunately
this causes slower code (either due to bad vectorization or increased
register pressure), 38.2s vs. 43.2s for me.

There are 6 vectorized loops in induct_v2 without the patch and
29 vectorized loops with the patch.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]