This is the mail archive of the
mailing list for the GCC project.
Re: [GCC RFC]A new and simple pass merging paired load store instructions
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: Mike Stump <mikestump at comcast dot net>, "bin.cheng" <bin dot cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 16 May 2014 18:07:32 +0800
- Subject: Re: [GCC RFC]A new and simple pass merging paired load store instructions
- Authentication-results: sourceware.org; auth=none
- References: <004d01cf700e$ef1e30e0$cd5a92a0$ at arm dot com> <32B4330F-1D0F-4D4E-BF7A-2E5B2148B893 at comcast dot net> <5374F59D dot 3030101 at redhat dot com>
On Fri, May 16, 2014 at 1:13 AM, Jeff Law <firstname.lastname@example.org> wrote:
> On 05/15/14 10:51, Mike Stump wrote:
>> On May 15, 2014, at 12:26 AM, bin.cheng <email@example.com> wrote:
>>> Here comes up with a new GCC pass looking through each basic block
>>> and merging paired load store even they are not adjacent to each
>> So I have a target that has load and store multiple support that
>> supports large a number of registers (2-n registers), and I added a
>> sched0 pass that is a light copy of the regular scheduling pass that
>> uses a different cost function which arranges all loads first, then
>> all stores then everything else. Within a group of loads or stores
>> the secondary key is the base register, the next key is the offset.
>> The net result, all loads off the same register are sorted in
>> increasing order.
> Glad to see someone else stumble on (ab)using the scheduler to do this.
Emm, If it's (ab)using, should we still do it then?
> I've poked at the scheduler several times to do similar stuff, but was never
> really satisfied with the results and never tried to polish those prototypes
> into something worth submitting.
> One example I've poked at was discovery of stores which then feed into a
> load from the same location. Which obviously we'd prefer to turn into a
> store + copy (subject to mess of constraints). There's a handful of these
> kind of transformations that seem to naturally drop out of this kind of
As Mike stated, merging of consecutive memory accesses is all about
the base register and the offset. I am thinking another method
collecting all memory accesses with same base register then doing the
merge work. In this way, we should be able to merge more than 2
instructions, also it would be possible to remove redundant load
instructions in one pass.
My question is how many these redundant loads could be? Is there any
rtl pass responsible for this now?
> Similarly a post-reload pass could be used to promote single word
> loads/stores to double-word operations.
> If anyone cared about PA 1.1 code generation, it'd be a much cleaner way to
> support the non-fused fmpyadd fmpysub insns.
> Anyway, if you want to move forward with the idea, I'd certainly support
> doing so.