This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler
- From: Andrew Pinski <pinskia at gmail dot com>
- To: Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Vladimir Makarov <vmakarov at redhat dot com>, Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>
- Date: Mon, 20 Oct 2014 22:37:02 -0700
- Subject: Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler
- Authentication-results: sourceware.org; auth=none
- References: <1EA5C9A0-7E86-4D1E-B1D1-171E7DC1650F at linaro dot org>
On Mon, Oct 20, 2014 at 9:06 PM, Maxim Kuvyrkov
> This patch adds auto-prefetcher modeling to GCC scheduler. The auto-prefetcher model is currently enabled only for ARM Cortex-A15, since this is the only CPU that I know of to have the hardware auto-prefetcher unit.
That might be the only ARM processor but I know the PowerPC 970 and
power 4 have a hardware auto-prefetcher. They are slightly different
in how many streams can be active. The 970 has some streams reserved
for user streams. The PowerPC Cell also has a similar thing.
> The documentation on the auto-prefetcher is very sparse, and all I have are my empirical studies and a short note in Cortex-A15 manual (search for "L2 cache auto-prefether"). This patch, therefore, implements a very abstract model that makes scheduler prefer "mem_op (base+8); mem_op (base+12)" over "mem_op (base+12); mem_op (base+8)". In other words, memory operations are tried to be issued in order of increasing memory offsets.
> The auto-prefetcher model implementation is based on max_issue mutlipass lookahead scheduling, and its "guard" hook. The guard hook examines contents of the ready list and the queue, and, if it finds instructions with lower memory offsets, marks instructions with higher memory offset as unavailable for immediate scheduling.
> This patch has been in works since beginning of the year, and many of my previous scheduler cleanup patches were to prepare the infrastructure for this feature.
> Ramana, this change requires benchmarking, which I can't easily do at the moment. I would appreciate any benchmarking results that you can share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be tuned/confirmed for Cortex-A15.
> At the moment the parameter is set to "2", which means that the autopref model will look through ready list and 1-stall queue in search of relevant instructions. Values of -1 (disable autopref), 0 (use autopref only in rank_for_schedule), 1 (look through ready list), 2 (look through ready list and 1-stall queue), and 3 (look through ready list and 2-stall queue) should be considered and benchmarked.
> Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueaihf and aarch64-linux-gnu. OK to apply, provided no performance or correctness regressions?
> [ChangeLog is part of the git patch]
> Thank you,
> Maxim Kuvyrkov