This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: Mike Stump <mikestump at comcast dot net>, Bin Cheng <bin dot cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 31 Oct 2014 13:36:06 +0800
- Subject: Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- Authentication-results: sourceware.org; auth=none
- References: <000001cfdc90$1d95c670$58c15350$ at arm dot com> <54384C12 dot 6060401 at redhat dot com> <80EFD85E-49B5-4F71-9401-40F8FA85BD65 at comcast dot net> <CAHFci2__KJST3yCzz4sKge9pN37CZwUZ8BvFL4++SuCJAsAtxw at mail dot gmail dot com> <545294D2 dot 2020502 at redhat dot com>
Thanks for giving it a try.
On Fri, Oct 31, 2014 at 3:43 AM, Jeff Law <firstname.lastname@example.org> wrote:
> On 10/10/14 21:32, Bin.Cheng wrote:
>> Mike already gave great answers, here are just some of my thoughts on
>> the specific questions. See embedded below.
> Thanks to both of you for your answers.
> Fundamentally, what I see is this scheme requires us to be able to come up
> with a key based solely on information in a particular insn. To get fusion
> another insn has to have the same or a closely related key.
> This implies that the the two candidates for fusion are related, even if
> there isn't a data dependency between them. The canonical example would be
> two loads with reg+d addressing modes. If they use the same base register
> and the displacements differ by a word, then we don't have a data dependency
> between the insns, but the insns are closely related by their address
> computations and we can compute a key to ensure those two related insns end
> up consecutive. At any given call to the hook, the only context we can
> directly see is the current insn.
> I'm pretty sure if I were to tweak the ARM bits ever-so-slightly it could
> easily model the load-load or store-store special case on the PA7xxx[LC]
> processors. Normally a pair of loads or stores can't dual issue. But if
> the two loads (or two stores) hit the upper and lower half of a double-word
> objects, then the instructions can dual issue.
> I'd forgotten about that special case scheduling opportunity until I started
> looking at some unrelated enhancement for prefetching.
> Your code would also appear to allow significant cleanup of the old
> caller-save code that had a fair amount of bookkeeping added to issue
> double-word memory loads/stores rather than single word operations. This
> *greatly* improved performance on the old sparc processors which had no
> call-saved FP registers.
> However, your new code doesn't handle fusing instructions which are totally
> independent and of different static types. There just isn't a good way to
> compute a key that I can see. And this is OK -- that case, if we cared to
> improve it, would be best handled by the SCHED_REORDER hooks.
>>>> I guess another way to ask the question, are fusion priorities static
>>>> based on the insn/alternative, or can they vary? And if they can vary, can
>>>> they vary each tick of the scheduler?
>> Though this pass works on predefined fusion types and priorities now,
>> there might be two possible fixes for this specific problem.
>> 1) Introduce another exclusive_pri, now it's like "fusion_pri,
>> priority, exclusive_pri". The first one is assigned to mark
>> instructions belonging to same fusion type. The second is assigned to
>> fusion each pair/consecutive instructions together. The last one is
>> assigned to prevent specific pair of instructions from being fused,
>> just like "BC" mentioned.
>> 2) Extend the idea by using hook function
>> TARGET_SCHED_REORDER/TARGET_SCHED_REORDER2. Now we can assign
>> fusion_pri at the first place, making sure instructions in same fusion
>> type will be adjacent to each other, then we can change priority (thus
>> reorder the ready list) at back-end's wish even per each tick of the
> #2 would be the best solution for the case I was pondering, but I don't
> think solving that case is terribly important given the processors for which
> it was profitable haven't been made for a very long time.
I am thinking if it's possible to introduce a pattern-directed fusion.
Something like define_fusion, and adapting haifa-scheduler for it. I
agree there are two kinds (relevant and irrelevant) fusion types, and
it's not trivial to support both in one scheme. Do you have a
specific example that I can have a try?
This is just a preliminary idea and definitely can't catch up in GCC
5.0. Moreover, so far I didn't see such requirement on ARM/AARCH64 or
any other targets that I know, so I would like to continue with this
version if it's fine.
Later I will send patch pairing different kinds of ldp/stp based on
this for aarch64.