This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Patch to Avoid Bad Prefetching


I am still working on the prefetcher and plan on checking in a few more
enhancements in 4.5. However, with this patch that has been checked in
already, I managed to get a significant improvement from prefetching on
the AMD platform by tuning the prefetching parameters as well as other
GCC flags. If you are interested, I can work with you on doing a similar
tuning on PowerPC. 

On a Shanghai machine, prefetching currently improves the INT2006 score
by 1.78% and the FP2006 score by 0.75% relative to no prefetching. These
gains come from the following benchmarks:
Libquantum (27%)
Bwaves (5%)
Soplex (3%)
Lbm (13%). 

On all other benchmarks, the difference between prefetching and no
prefetching is within 2% plus or minus.

The big question is: why aren't you seeing the big performance gains on
libquantum and lbm? One possibility is that the PowerPC HW does the
trick and SW prefetching is not needed, but I don't think that's the
case. My guess is that you are not using the right GCC flags and the
right prefetching parameters. In particular, I suspect that prefetching
on libquantum and lbm is suppressed by a small setting of the
"simultaneous-prefetches" parameter. Can you please rerun libquantum and
lbm with -O3 and --param simultaneous-prefetches=100?

Note that enabling prefetching on libquantum requires strict aliasing,
which is enabled by default if you use -O3.

As for the big degradations that you are still seeing, I am currently
working on enhancing the cost model to hopefully eliminate all the
degradations, especially on the Fortran benchmarks. For the time being
though, I believe you can eliminate these degradations by using a large
enough value for the new parameter prefetch-min-insn-to-mem-ratio. In my
environment, I am setting this parameter to 5 for the Fortran benchmarks
and to 4 for the C programs in INT2006. The default value of 3 works
fine on all other CPU2006 benchmarks. As mentioned above, these settings
reduce all degradations to less than 2%. Along with the significant
gains on the above benchmarks, that should give you a net positive gain
from prefetching. So, I suggest that you try adjusting this parameter as
well.  
 
Thanks
-Ghassan

-----Original Message-----
From: Pat Haugen [mailto:pthaugen@us.ibm.com] 
Sent: Thursday, June 18, 2009 11:41 AM
To: Shobaki, Ghassan
Cc: gcc-patches@gcc.gnu.org; Zdenek Dvorak; Richard Guenther
Subject: RE: Patch to Avoid Bad Prefetching

> FP2006:
> No prefetching: 15.3
> Current prefetching: 14.0 (-8.5%)
> Patched prefetching: 15.2 (-0.5%)
> So, the patch gives an improvement of 8.7% relative to the existing
code.
>
> INT2006:
> No prefetching: 14.6
> Current prefetching: 14.3 (-2.3%)
> Patched prefetching: 14.8 (1.22%)
> So, the patch gives an improvement of 3.6% relative to the existing
code.
>

I tried the patch on PowerPC and also saw improvements over the existing
prefetching, although in general looks like prefetching needs to be
tuned
for PowerPC since the clear winner is no prefetching.

FP2006:
Current prefetching relative to no prefetching: -9.9%
Patched prefetching relative to no prefetching: -5.2%
The patched prefetching gives a 5.2% improvement relative to existing
code.

INT2006:
Current prefetching relative to no prefetching: -3.2%
Patched prefetching relative to no prefetching: -2.1%
The patched prefetching gives a 1.2% improvement relative to existing
code.


433.milc was the only benchmark where prefetching gave a measurable
improvement over no prefetching, 5.5%. And even with the patched
prefetching, there were still 5 benchmarks that degraded double-digit
percentage with prefetching enabled (libquantum, zeusmp, cactusADM,
leslie3d, calculix).  There were two benchmarks that failed to build
with
prefetching enabled, both existing/patched versions, which I'll open
bugzillas for (h264ref and xalancbmk).

-Pat




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]