This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass


Thanks for giving it a try.

On Fri, Oct 31, 2014 at 3:43 AM, Jeff Law <law@redhat.com> wrote:
> On 10/10/14 21:32, Bin.Cheng wrote:
>>
>> Mike already gave great answers, here are just some of my thoughts on
>> the specific questions.  See embedded below.
>
> Thanks to both of you for your answers.
>
> Fundamentally, what I see is this scheme requires us to be able to come up
> with a key based solely on information in a particular insn.  To get fusion
> another insn has to have the same or a closely related key.
>
> This implies that the the two candidates for fusion are related, even if
> there isn't a data dependency between them.  The canonical example would be
> two loads with reg+d addressing modes.  If they use the same base register
> and the displacements differ by a word, then we don't have a data dependency
> between the insns, but the insns are closely related by their address
> computations and we can compute a key to ensure those two related insns end
> up consecutive.  At any given call to the hook, the only context we can
> directly see is the current insn.
>
> I'm pretty sure if I were to tweak the ARM bits ever-so-slightly it could
> easily model the load-load or store-store special case on the PA7xxx[LC]
> processors.  Normally a pair of loads or stores can't dual issue.  But if
> the two loads (or two stores) hit the upper and lower half of a double-word
> objects, then the instructions can dual issue.
>
> I'd forgotten about that special case scheduling opportunity until I started
> looking at some unrelated enhancement for prefetching.
>
> Your code would also appear to allow significant cleanup of the old
> caller-save code that had a fair amount of bookkeeping added to issue
> double-word memory loads/stores rather than single word operations. This
> *greatly* improved performance on the old sparc processors which had no
> call-saved FP registers.
>
> However, your new code doesn't handle fusing instructions which are totally
> independent and of different static types.  There just isn't a good way to
> compute a key that I can see.  And this is OK -- that case, if we cared to
> improve it, would be best handled by the SCHED_REORDER hooks.
>
>>>>
>>>> I guess another way to ask the question, are fusion priorities static
>>>> based on the insn/alternative, or can they vary?  And if they can vary, can
>>>> they vary each tick of the scheduler?
>>
>>
>> Though this pass works on predefined fusion types and priorities now,
>> there might be two possible fixes for this specific problem.
>> 1) Introduce another exclusive_pri, now it's like "fusion_pri,
>> priority, exclusive_pri".  The first one is assigned to mark
>> instructions belonging to same fusion type.  The second is assigned to
>> fusion each pair/consecutive instructions together.  The last one is
>> assigned to prevent specific pair of instructions from being fused,
>> just like "BC" mentioned.
>> 2) Extend the idea by using hook function
>> TARGET_SCHED_REORDER/TARGET_SCHED_REORDER2.  Now we can assign
>> fusion_pri at the first place, making sure instructions in same fusion
>> type will be adjacent to each other, then we can change priority (thus
>> reorder the ready list) at back-end's wish even per each tick of the
>> scheduler.
>
> #2 would be the best solution for the case I was pondering, but I don't
> think solving that case is terribly important given the processors for which
> it was profitable haven't been made for a very long time.
I am thinking if it's possible to introduce a pattern-directed fusion.
Something like define_fusion, and adapting haifa-scheduler for it.  I
agree there are two kinds (relevant and irrelevant) fusion types, and
it's not trivial to support both in one scheme.  Do you have a
specific example that I can have a try?

This is just a preliminary idea and definitely can't catch up in GCC
5.0.  Moreover, so far I didn't see such requirement on ARM/AARCH64 or
any other targets that I know, so I would like to continue with this
version if it's fine.

Later I will send patch pairing different kinds of ldp/stp based on
this for aarch64.

Thanks,
bin

>
> Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]