This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c


On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>Hi,
>
>This patch tweaks autoprefetcher heuristic in scheduler to better group
>memory loads and stores together.
>
>From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598:
>
>There are two separate changes, both related to instruction scheduler,
>that cause the regression.  The first change in r253235 is responsible
>for 70% of the regression.
>===
>    haifa-sched: fix autopref_rank_for_schedule qsort comparator
>    
> * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
>            first, always call autopref_rank_data otherwise.
>    
>    
>    
>git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235
>138bc75d-0d04-0410-961f-82ee72b054a4
>===
>
>After this change instead of
>r1 = [rb + 0]
>r2 = [rb + 8]
>r3 = [rb + 16]
>r4 = <math with r1>
>r5 = <math with r2>
>r6 = <math with r3>
>
>we get
>r1 = [rb + 0]
><math with r1>
>r2 = [rb + 8]
><math with r2>
>r3 = [rb + 16]
><math with r3>
>
>which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This
>schedule happens because r2= load gets lower priority than the
>"irrelevant" <math with r1> due to the above patch.
>
>If we think about it, the fact that "r1 = [rb + 0]" can be scheduled
>means that true dependencies of all similar base+offset loads are
>resolved.  Therefore, for autoprefetcher-friendly schedule we should
>prioritize memory reads before "irrelevant" instructions.

But isn't there also max number of load issues in a fetch window to consider? 
So interleaving arithmetic with loads might be profitable. 

>On the other hand, following similar logic, we want to delay memory
>stores as much as possible to start scheduling them only after all
>potential producers are scheduled.  I.e., for autoprefetcher-friendly
>schedule we should prioritize "irrelevant" instructions before memory
>writes.
>
>Obvious patch to implement the above is attached.  It brings 70% of
>regressed performance on this testcase back.
>
>OK to commit?
>
>Regards,
>
>--
>Maxim Kuvyrkov
>www.linaro.org


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]