This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: PATCH: Enhance Reuse Analysis of Long-stride Memory References in the Prefetcher

From: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: "Shobaki, Ghassan" <Ghassan dot Shobaki at amd dot com>, gcc-patches at gcc dot gnu dot org, "Pop, Sebastian" <Sebastian dot Pop at amd dot com>
Date: Sat, 8 Aug 2009 07:25:15 +0200
Subject: Re: PATCH: Enhance Reuse Analysis of Long-stride Memory References in the Prefetcher
References: <84fc9c000906041356v731ab35ct9a2b807c7920c01e@mail.gmail.com> <912DA18E911D8B418824641EF1541F3C41849D@ssvlexmb2.amd.com> <84fc9c000906050200u11ff9809u7fa2e62ad9918fb2@mail.gmail.com> <912DA18E911D8B418824641EF1541F3C418548@ssvlexmb2.amd.com> <84fc9c000906050801u71327776ge0d4f46d93f3f862@mail.gmail.com> <de8d50360906081116p5ef491d0m9a2c656d6b937bd5@mail.gmail.com> <912DA18E911D8B418824641EF1541F3C4AC399@ssvlexmb2.amd.com> <4A2DCA99.1080502@redhat.com> <534C8623EDA7174AA9FF7ADD752AD8E44C5929@ssvlexmb2.amd.com> <Pine.LNX.4.64.0908080104410.31521@digraph.polyomino.org.uk>

Hi,

> Joseph S. Myers:
> > On Fri, 7 Aug 2009, Shobaki, Ghassan wrote:
> 
> > 2. Using floating point arithmetic for a more precise computation of the 
> > miss probability.
> 
> Can you prove that this cannot lead to results depending on the host 
> system (for example, being different depending on whether float arithmetic 
> has excess precision)?
> 
> > +  miss_rate = (float) delta / (float) prefetch_block;
> > +  acceptable_miss_rate = (float) ACCEPTABLE_MISS_RATE / 1000.0;
> 
> Is there some reason you can't simply use appropriately scaled integer 
> arithmetic (miss_rate = 1000 * delta / prefetch_block)?  (This supposes 
> that 1000 * delta can't overflow; I don't know if there are already checks 
> elsewhere in the code that avoid that possibility, but if not you'd need 
> to add them.)

as delta < prefetch_block (which is cache line size, i.e., usually something
like 64), the overflow is not an issue, and I also believe that using only
the integer arithmetic would be better.  Also,

> In particular, if the offset difference (delta) between the two memory
> references in question is less than the greatest common divisor (GCD) of the
> step and the cache_line_size, the current analysis will give a miss probability
> of zero, thus causing the pruning of one of the two memory references. To see
> why this can be a significant under-estimate, consider, for example, two memory
> references with a step of 96, a cache_line_size of 64 and a delta of 30.  In
> this case the GCD is 32. Diving by this GCD gives a step of 3, a
> cache_line_size of 2 and a delta of zero. This leads to a miss probability of
> zero.  The actual miss probability in this case depends on the alignment of the
> two memory references. If the first memory reference is aligned on a cache line
> boundary, the miss probability will be 0% (both memory references will hit the
> same cache line in all iterations), but if the memory reference falls somewhere
> in the middle of a cache line (which is very likely), the miss probability will
> be 50% (the two memory references will hit different cache lines in every other
> iteration). Obviously, when the miss probability is 50%, we would like to issue
> prefetches for both memory references.

As your discussion shows, the "miss rate" you compute makes only sense if you
average over all possible alignments (and consider them all to be equally
likely).  I think it would be preferable to compute explictly the miss rates
that are actually possible, and their probabilities depending on the alignment.
That is, in your example the miss rate is

0% with probability 4/64 (for the alignments 0, 1, 32 and 33 of the first memory reference) and
50% with probability 60/64 (the remaining cases)

This should make the validity and the assumptions of the computation more obvious, as well
as make further changes (e.g., taking the actual alignment of the references into account
or altering it) easier.  The patch is OK with these two changes,

Zdenek

Follow-Ups:
- RE: PATCH: Enhance Reuse Analysis of Long-stride Memory References in the Prefetcher
  - From: Shobaki, Ghassan
- RE: PATCH: Enhance Reuse Analysis of Long-stride Memory References in the Prefetcher
  - From: Shobaki, Ghassan

References:
- PATCH: Enhance Reuse Analysis of Long-stride Memory References in the Prefetcher
  - From: Shobaki, Ghassan
- Re: PATCH: Enhance Reuse Analysis of Long-stride Memory References in the Prefetcher
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]