This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Tree level array prefetching


On Sun, Jun 12, 2005 at 11:39:35PM +0200, Zdenek Dvorak wrote:
> Hello,
> 
> this patch implements prefetching on tree level.  It is the updated and
> upgraded version of the prefetching pass I have developed on lno branch
> about a year ago.  I am not sure wheter this type of patch is suitable
> at the current stage (most likely not), but anyway, comments are
> welcome.
> 
> Description of how the pass works can be found at the beginning of
> tree-ssa-loop-prefetch.c.  Basically we find memory references, check
> for reuses to determine those that do not need to be prefetched and
> those that do not need to be prefetched in every iteration, then
> we unroll the loop as necessary and inserts the prefetch instructions
> (calls to builtin_prefetch).
> 
> The patch does not remove the rtl profiling pass in order to keep it
> shorter.  It also includes quite a few changes that are necessary or
> useful to make other optimizers handle loops after unrolling and with
> prefetch instructions (updating of frequencies after loop versioning and
> unrolling, making the order of blocks after unrolling more sensible,
> change to tree-outof-ssa to prevent TER from increasing register presure
> too much in unrolled loops, nicer names for temporary variables created
> by store motion, etc.).  I will submit those separately, as they are
> interesting regardless of this patch.
> 
> The patch was bootstrapped & regtested on i686 and x86_64 with the pass
> enabled.  Below are the results (compared with the old rtl prefetching
> pass) of spec2000 on athlon; it seems to be a clear win on specint
> (with the only noticeable regression on crafty), and performs reasonably
> on specfp (although there are significant regressions on few tests;
> I tried to investigate a few of these, and they are caused by reasons
> that I was not able to fix, like the fact that register allocator
> sometimes does not handle to assign registers in an unrolled loop
> as well as it does in non-unrolled one).

I tried patching this on the mainline, and it didn't apply.  Would it be
possible to port the patch to the mainline so it can be judged independently?
I don't know how much you depend on stuff from the branch you were using,
whether it would be simple or hard to do.

-- 
Michael Meissner
email: gnu@the-meissners.org
http://www.the-meissners.org
Employed by Advanced Micro Devices, but not speaking for them


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]