This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler


On Oct 21, 2014, at 8:06 AM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:

> Hi,
> 
> This patch adds auto-prefetcher modeling to GCC scheduler.  The auto-prefetcher model is currently enabled only for ARM Cortex-A15, since this is the only CPU that I know of to have the hardware auto-prefetcher unit.
> 
> The documentation on the auto-prefetcher is very sparse, and all I have are my empirical studies and a short note in Cortex-A15 manual (search for "L2 cache auto-prefether").  This patch, therefore, implements a very abstract model that makes scheduler prefer "mem_op (base+8); mem_op (base+12)" over "mem_op (base+12); mem_op (base+8)".  In other words, memory operations are tried to be issued in order of increasing memory offsets.
> 
> The auto-prefetcher model implementation is based on max_issue mutlipass lookahead scheduling, and its "guard" hook.  The guard hook examines contents of the ready list and the queue, and, if it finds instructions with lower memory offsets, marks instructions with higher memory offset as unavailable for immediate scheduling.
> 
> This patch has been in works since beginning of the year, and many of my previous scheduler cleanup patches were to prepare the infrastructure for this feature. 
> 
> Ramana, this change requires benchmarking, which I can't easily do at the moment.  I would appreciate any benchmarking results that you can share.  In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be tuned/confirmed for Cortex-A15.
> 
> At the moment the parameter is set to "2", which means that the autopref model will look through ready list and 1-stall queue in search of relevant instructions.  Values of -1 (disable autopref), 0 (use autopref only in rank_for_schedule), 1 (look through ready list), 2 (look through ready list and 1-stall queue), and 3 (look through ready list and 2-stall queue) should be considered and benchmarked.
> 
> Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueaihf and aarch64-linux-gnu.  OK to apply, provided no performance or correctness regressions?
> 
> [ChangeLog is part of the git patch]

Ping?

All prerequisite patches for this one are now approved and [mostly] checked in.  This is the last outstanding item from my patch series to improve scheduling.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]