This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Improve loop array prefetch for IA-64


--- Steven Bosscher <stevenb.gcc@gmail.com>:

> On 6/2/06, Canqun Yang <canqun@yahoo.com.cn> wrote:
> > This patch results a performance increase of 4% for SPECfp2000 and 13% for NAS benchmark suite
> on
> > Itanium-2 system, respectively. More performance increase is hopeful by further tuning the
> > parameters and improving the prefetch algorithm at tree level.
> 
> Bravo.
> 
> > --- ia64.h (revision 114307)
> > +++ ia64.h (working copy)
> > @@ -1985,13 +1985,18 @@
> >    ??? This number is bogus and needs to be replaced before the value is
> >    actually used in optimizations.  */
> >
> > -#define SIMULTANEOUS_PREFETCHES 6
> > +#define SIMULTANEOUS_PREFETCHES 18
> 
> Is the number still bogus as the comment suggests, or is there a
> rationale for 18?  It looks quite high.
> 

The number is still bogus. But the original value 6 is small. For most of SPECfp2000 and NAS
benchmarks, 12 is enough. Only SPECfp2000 program 171.swim need many prefetches. The best value
for 171.swim is 20. I attached my paper on ACSAC05 to this mail. This paper describes  more clear
than that in proceedings of GCC Summit 2005.   

> > +/* A number that should roughly corresponding to the nunmber of instructions
> > +   executed before the prefetch is completed.  */
> > +
> > +#define PREFETCH_LATENCY 400
> 
> Likewise.  Is 400 cycles the memory latency on itanium-2?
> 
> Gr.
> Steven
> 

It is not the memory latency on itanium-2. The default value of PREFETCH_LATENCY is 200. It
roughly equals to the number of instructions executed before the prefetch is completed. Itanium-2
is a multi-issue architecture, and may issue one or more instructions at each cycle. So I still
roughly estimate that the average IPC (instructions per cycle) is about 2. Double the
PREFETCH_LATENCY can ensure that the prefetches are issued duly. 

The prefetch algorithm can not get the exact execution cycles of the loop at present. So 400 is
still bogus.

Canqun Yang

__________________________________________________
赶快注册雅虎超大容量免费邮箱?
http://cn.mail.yahoo.com

Attachment: acsac05-yang.pdf
Description: 2769953213-acsac05-yang.pdf


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]