This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: Mike Stump <mikestump at comcast dot net>, Bin Cheng <bin dot cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>, Jeff Law <law at redhat dot com>
- Date: Mon, 6 Oct 2014 13:32:28 +0200
- Subject: Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- Authentication-results: sourceware.org; auth=none
- References: <000001cfdc90$1d95c670$58c15350$ at arm dot com> <781A4573-F7ED-4ABD-B222-76F2044E641A at comcast dot net> <CAHFci29=8H_qQSCwUUMWgdyX-DQziE8hSagM7CDNBEMkrXo+3w at mail dot gmail dot com>
On Mon, Oct 6, 2014 at 11:57 AM, Bin.Cheng <firstname.lastname@example.org> wrote:
> On Wed, Oct 1, 2014 at 5:06 AM, Mike Stump <email@example.com> wrote:
>> On Sep 30, 2014, at 2:22 AM, Bin Cheng <firstname.lastname@example.org> wrote:
>>> Then I decided to take one step forward to introduce a generic
>>> instruction fusion infrastructure in GCC, because in essence, load/store
>>> pair is nothing different with other instruction fusion, all these optimizations
>>> want is to push instructions together in instruction flow.
>> I like the step you took. I had exactly this in mind when I wrote the original.
>>> N0 ~= 1300
>>> N1/N2 ~= 5000
>>> N3 ~= 7500
>> Nice. Would be nice to see metrics for time to ensure that the code isn't actually worse (CSiBE and/or spec and/or some other). I didn't have any large scale benchmark runs with my code and I did worry about extending lifetimes and register pressure.
> Hi Mike,
> I did collect spec2k performance after pairing load/store using this
> patch on both aarch64 and cortex-a15. The performance is improved
> obviously, especially on cortex-a57. There are some (though not many)
> benchmarks are regressed a little. There is no register pressure
> problem here because this pass is put between register allocation and
> sched2, I guess sched2 should resolve most pipeline hazards introduced
> by this pass.
How many merging opportunities does sched2 undo again? ISTR it
has the tendency of pushing stores down and loads up.
>>> I cleared up Mike's patch and fixed some implementation bugs in it
>> So, I'm wondering what the bugs or missed opportunities were? And, if they were of the type of problem that generated incorrect code or if they were of the type that was merely a missed opportunity.
> Just missed opportunity issues.