This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [patch] Improve prefetch heuristics
- From: "Fang, Changpeng" <Changpeng dot Fang at amd dot com>
- To: Christian Borntraeger <borntraeger at de dot ibm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: "rguenther at suse dot de" <rguenther at suse dot de>, "sebpop at gmail dot com" <sebpop at gmail dot com>
- Date: Thu, 29 Apr 2010 11:59:00 -0700
- Subject: RE: [patch] Improve prefetch heuristics
- References: <1C13CD442679CE45A2E80AE9251D7EF9220E4107@SAUSEXMBP01.amd.com> <201004291908.23330.borntraeger@de.ibm.com>
Good point, Christian:
Yes, this in the alignment issue. We can not say they are on the same
cache line no matter how small delta is. Just as you suggested, we need
a parameter to take into account the alignment issue. The default of 2 is
reasonable for a generic architecture, and we may need some experiments
to verify this.
I will add the debug log entry for the prefetches that are not generated.
Thanks,
Changpeng
-----Original Message-----
From: Christian Borntraeger [mailto:borntraeger@de.ibm.com]
Sent: Thursday, April 29, 2010 10:08 AM
To: gcc-patches@gcc.gnu.org
Cc: Fang, Changpeng; rguenther@suse.de; sebpop@gmail.com
Subject: Re: [patch] Improve prefetch heuristiWE cs
> Patch1: 0001-Do-not-insert-prefetches-if-they-would-hit-the-same-.patch
> This patch modify the prefetch generation logic. We don't issue a prefetch
> if it would fall on the same cache line with an existing memory reference or
> prefetch. This patch improves the following benchmarks: 416.gamess (~7%),
> 434.zeusmp (~4%), 454.calculix (~2%) and 445.gobmk (~2%).
>+ /* Don't issue a prefetch if its address falls on the same cache line
>+ with a previous memory reference (prefetch/load/store). */
>+ if (abs (delta - start_offset) < L1_CACHE_LINE_SIZE)
>+ /* Drop the prefetch. */
This might need a debug log entry about a dropped prefetch.
>+ continue;
>+ else
I looked a bit further into these patches.
This patch causes a regression with lbm on s390.
This is causes by the fact that ahead is 1 step is 160 and with
cache-line-size=256 this patch drops all prefetches on s390 in
lbms hotloop.
Thinking more about the whole logic, wouldnt it make sense to check for
abs (delta - start_offset) < L1_CACHE_LINE_SIZE / 2
^^^
because we cannot assume that we were starting at the beginning of a cache line?
Or maybe have something like
abs (delta - start_offset) < L1_CACHE_LINE_SIZE / CACHE_LINE_AGAIN_FACTOR
which can be defined by the architecture.
Christian