This is the mail archive of the
mailing list for the GCC project.
Re: [patch] Tree level array prefetching
- From: Michael Meissner <gcc-pat at the-meissners dot org>
- To: gcc-patches at gcc dot gnu dot org
- Date: Tue, 14 Jun 2005 09:57:18 -0400
- Subject: Re: [patch] Tree level array prefetching
- References: <20050612213935.GA14103@atrey.karlin.mff.cuni.cz>
On Sun, Jun 12, 2005 at 11:39:35PM +0200, Zdenek Dvorak wrote:
> this patch implements prefetching on tree level. It is the updated and
> upgraded version of the prefetching pass I have developed on lno branch
> about a year ago. I am not sure wheter this type of patch is suitable
> at the current stage (most likely not), but anyway, comments are
> Description of how the pass works can be found at the beginning of
> tree-ssa-loop-prefetch.c. Basically we find memory references, check
> for reuses to determine those that do not need to be prefetched and
> those that do not need to be prefetched in every iteration, then
> we unroll the loop as necessary and inserts the prefetch instructions
> (calls to builtin_prefetch).
> The patch does not remove the rtl profiling pass in order to keep it
> shorter. It also includes quite a few changes that are necessary or
> useful to make other optimizers handle loops after unrolling and with
> prefetch instructions (updating of frequencies after loop versioning and
> unrolling, making the order of blocks after unrolling more sensible,
> change to tree-outof-ssa to prevent TER from increasing register presure
> too much in unrolled loops, nicer names for temporary variables created
> by store motion, etc.). I will submit those separately, as they are
> interesting regardless of this patch.
> The patch was bootstrapped & regtested on i686 and x86_64 with the pass
> enabled. Below are the results (compared with the old rtl prefetching
> pass) of spec2000 on athlon; it seems to be a clear win on specint
> (with the only noticeable regression on crafty), and performs reasonably
> on specfp (although there are significant regressions on few tests;
> I tried to investigate a few of these, and they are caused by reasons
> that I was not able to fix, like the fact that register allocator
> sometimes does not handle to assign registers in an unrolled loop
> as well as it does in non-unrolled one).
I tried patching this on the mainline, and it didn't apply. Would it be
possible to port the patch to the mainline so it can be judged independently?
I don't know how much you depend on stuff from the branch you were using,
whether it would be simple or hard to do.
Employed by Advanced Micro Devices, but not speaking for them