This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] Tree level array prefetching
- From: Michael Meissner <gcc-pat at the-meissners dot org>
- To: gcc-patches at gcc dot gnu dot org
- Date: Tue, 14 Jun 2005 09:57:18 -0400
- Subject: Re: [patch] Tree level array prefetching
- References: <20050612213935.GA14103@atrey.karlin.mff.cuni.cz>
On Sun, Jun 12, 2005 at 11:39:35PM +0200, Zdenek Dvorak wrote:
> Hello,
>
> this patch implements prefetching on tree level. It is the updated and
> upgraded version of the prefetching pass I have developed on lno branch
> about a year ago. I am not sure wheter this type of patch is suitable
> at the current stage (most likely not), but anyway, comments are
> welcome.
>
> Description of how the pass works can be found at the beginning of
> tree-ssa-loop-prefetch.c. Basically we find memory references, check
> for reuses to determine those that do not need to be prefetched and
> those that do not need to be prefetched in every iteration, then
> we unroll the loop as necessary and inserts the prefetch instructions
> (calls to builtin_prefetch).
>
> The patch does not remove the rtl profiling pass in order to keep it
> shorter. It also includes quite a few changes that are necessary or
> useful to make other optimizers handle loops after unrolling and with
> prefetch instructions (updating of frequencies after loop versioning and
> unrolling, making the order of blocks after unrolling more sensible,
> change to tree-outof-ssa to prevent TER from increasing register presure
> too much in unrolled loops, nicer names for temporary variables created
> by store motion, etc.). I will submit those separately, as they are
> interesting regardless of this patch.
>
> The patch was bootstrapped & regtested on i686 and x86_64 with the pass
> enabled. Below are the results (compared with the old rtl prefetching
> pass) of spec2000 on athlon; it seems to be a clear win on specint
> (with the only noticeable regression on crafty), and performs reasonably
> on specfp (although there are significant regressions on few tests;
> I tried to investigate a few of these, and they are caused by reasons
> that I was not able to fix, like the fact that register allocator
> sometimes does not handle to assign registers in an unrolled loop
> as well as it does in non-unrolled one).
I tried patching this on the mainline, and it didn't apply. Would it be
possible to port the patch to the mainline so it can be judged independently?
I don't know how much you depend on stuff from the branch you were using,
whether it would be simple or hard to do.
--
Michael Meissner
email: gnu@the-meissners.org
http://www.the-meissners.org
Employed by Advanced Micro Devices, but not speaking for them