This is the mail archive of the
mailing list for the GCC project.
RE: Patch to Avoid Bad Prefetching
I meant the time estimate.
For example, consider a loop with 2 cache missing memory ops and 4 CPU
ops. Assuming that we set the min_insn_to_mem_ratio to 3, the current
*imprecise* heuristic will calculate an insn-to-mem ratio of (2+4)/2=3
and conclude that this loop will benefit from prfetching whether the 4
CPU ops are integer adds or FP divides. A more precise calculation will
go like this:
Time for memory ops = 200 cycle (assuming a cache-miss latency of 200
and that the machine can do 2 mem ops simultaneously)
If the 4 CPU ops are integer arithmetic:
Time for CPU ops = 4*1=4 (assuming that int arith ops take one cycle
Max potential benefit from prefetching = 4/204 = 2% (probably not
significant enough to pay off for the prefething cost)
On the other hand,
If the 4 CPU ops are FP divides:
Time for CPU ops = 4*20=80 (assuming that FP divides take 20 cycles
Max potential benefit from prefetching = 80/(200+80) = 29% (most likely
significant enough to pay off for the prefething cost and deliver a good
As you can see, this is much more precise than simply looking at the
ratio, but it requires a good time estimate for each operation. I assume
that the function tree_num_loop_insns() internally computes such time
estimates if we pass it time weights. Of course, I know that middle-end
time estimates will not be very precise compared to backend estimates.
BTW, are the middle-end time estimates machine dependent?
Are you OK with checking in the patch with the ratio for now?
From: Richard Guenther [mailto:email@example.com]
Sent: Thursday, June 04, 2009 1:57 PM
To: Shobaki, Ghassan
Cc: Zdenek Dvorak; firstname.lastname@example.org
Subject: Re: Patch to Avoid Bad Prefetching
On Thu, Jun 4, 2009 at 9:24 PM, Shobaki, Ghassan
> I agree with Zdenek that the first heuristic is imprecise and needs
improvement. At least it has to distinguish between simple integer
arithmetic ops and expensive FP ops. I am planning on doing this very
soon. As I explained in the previous email, that will require computing
an instruction-by-instruction time estimate using the estimates that GCC
already computes but adapting them to properly account for the costs of
cache missing memory ops.
GCC has both size and time estimates for code - I guess this is where
the size estimate but really ment the time estimate, no?