This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [GCC RFC]A new and simple pass merging paired load store instructions

From: "Bin.Cheng" <amker dot cheng at gmail dot com>
To: Jeff Law <law at redhat dot com>
Cc: Mike Stump <mikestump at comcast dot net>, "bin.cheng" <bin dot cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
Date: Mon, 19 May 2014 14:38:41 +0800
Subject: Re: [GCC RFC]A new and simple pass merging paired load store instructions
Authentication-results: sourceware.org; auth=none
References: <004d01cf700e$ef1e30e0$cd5a92a0$ at arm dot com> <32B4330F-1D0F-4D4E-BF7A-2E5B2148B893 at comcast dot net> <5374F59D dot 3030101 at redhat dot com> <CAHFci2-9mtfvrE20+jsLvS8ySxVhVNAt3Nh4jG37aZP9BgqAmQ at mail dot gmail dot com> <53763A39 dot 30800 at redhat dot com>

On Sat, May 17, 2014 at 12:18 AM, Jeff Law <law@redhat.com> wrote:
> On 05/16/14 04:07, Bin.Cheng wrote:
>>
>> On Fri, May 16, 2014 at 1:13 AM, Jeff Law <law@redhat.com> wrote:
>>>
>>> On 05/15/14 10:51, Mike Stump wrote:
>>>>
>>>>
>>>> On May 15, 2014, at 12:26 AM, bin.cheng <bin.cheng@arm.com> wrote:
>>>>>
>>>>>
>>>>> Here comes up with a new GCC pass looking through each basic block
>>>>> and merging paired load store even they are not adjacent to each
>>>>> other.
>>>>
>>>>
>>>>
>>>> So I have a target that has load and store multiple support that
>>>> supports large a number of registers (2-n registers), and I added a
>>>> sched0 pass that is a light copy of the regular scheduling pass that
>>>> uses a different cost function which arranges all loads first, then
>>>> all stores then everything else.  Within a group of loads or stores
>>>> the secondary key is the base register, the next key is the offset.
>>>> The net result, all loads off the same register are sorted in
>>>> increasing order.
>>>
>>>
>>> Glad to see someone else stumble on (ab)using the scheduler to do this.
>>
>> Emm, If it's (ab)using, should we still do it then?
>
> I think it'd still be fine.  There's even been a comment about doing this
> kind of thing in the scheduler that's been around since the early 90s...
>
> The scheduler is a bit interesting in that it has a wealth of dependency
> information and the ability to reorganize the insn stream in relatively
> arbitrary ways.  That seems to make it a natural place to think about
> transformations of this nature.  We just haven't had a good infrastructure
> for doing that.
>
> In theory we're a lot closer now to being able to plug in different
> costing/sorting models and let the scheduler do its thing.  Those models
> might rewrite for register pressure, or encourage certain independent insns
> to issue back-to-back to encourage combining, or to build candidate insns
> for delay slot scheduling, etc.
>
>
>> As Mike stated, merging of consecutive memory accesses is all about
>> the base register and the offset. I am thinking another method
>> collecting all memory accesses with same base register then doing the
>> merge work.  In this way, we should be able to merge more than 2
>> instructions, also it would be possible to remove redundant load
>> instructions in one pass.
>>
>> My question is how many these redundant loads could be?  Is there any
>> rtl pass responsible for this now?
>
> I suspect it's a lot less important now than it used to be.  But there's
> probably some cases where it'd be useful.  Combining sub-word accesses into
> full-word accesses come immediately to mind.
>
> I'm not aware of any pass which does these kind of changes in a general
> form.  Some passes (caller-save) do a fair amount of work to track when they
> can generate multi-object loads/stores (and it was a huge win back on the
> old sparc processors).
>
Glad this RFC has attracted some attentions and thanks for all the
comments.  Here I can see four major concerns as below:
1) Should we do it in a separated pass, or just along with scheduler?

2) When should we run the new pass, before or after RA?  There are
both advantages and disadvantages and very depends on the target for
which we are compiling.
I have no simple answer to this.  Maybe we can run the pass twice or
follow Oleg's suggestion.  I think it's a new strategy for GCC to let
backend decide when to run a pass.

3) Do we need a new target hook interface?
I answered this in other messages and I still think it's target dependent.

4) The optimization should be able to handle cases with more than 2
consecutive load/store instructions.
The current implementation can't handle such cases and need further extension.

The 3) and 4) are just implementation questions, while I am not sure
about 1) and 2), so any more comments that we could make some
decisions to carry on this optimization?

Thanks,
bin

-- 
Best Regards.

Follow-Ups:
- Re: [GCC RFC]A new and simple pass merging paired load store instructions
  - From: Jeff Law

References:
- [GCC RFC]A new and simple pass merging paired load store instructions
  - From: bin.cheng
- Re: [GCC RFC]A new and simple pass merging paired load store instructions
  - From: Mike Stump
- Re: [GCC RFC]A new and simple pass merging paired load store instructions
  - From: Jeff Law
- Re: [GCC RFC]A new and simple pass merging paired load store instructions
  - From: Bin.Cheng
- Re: [GCC RFC]A new and simple pass merging paired load store instructions
  - From: Jeff Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]