[PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
Richard Sandiford
richard.sandiford@arm.com
Thu Dec 7 12:20:53 GMT 2023
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote:
>>
>> On Wed, 6 Dec 2023 at 23:32, Richard Biener <richard.guenther@gmail.com> wrote:
>> >
>> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>> > <manos.anagnostakis@vrull.eu> wrote:
>> > >
>> > > This is an RTL pass that detects store forwarding from stores to larger loads (load pairs).
>> > >
>> > > This optimization is SPEC2017-driven and was found to be beneficial for some benchmarks,
>> > > through testing on ampere1/ampere1a machines.
>> > >
>> > > For example, it can transform cases like
>> > >
>> > > str d5, [sp, #320]
>> > > fmul d5, d31, d29
>> > > ldp d31, d17, [sp, #312] # Large load from small store
>> > >
>> > > to
>> > >
>> > > str d5, [sp, #320]
>> > > fmul d5, d31, d29
>> > > ldr d31, [sp, #312]
>> > > ldr d17, [sp, #320]
>> > >
>> > > Currently, the pass is disabled by default on all architectures and enabled by a target-specific option.
>> > >
>> > > If deemed beneficial enough for a default, it will be enabled on ampere1/ampere1a,
>> > > or other architectures as well, without needing to be turned on by this option.
>> >
>> > What is aarch64-specific about the pass?
>> >
>> > I see an increasingly large number of target specific passes pop up (probably
>> > for the excuse we can generalize them if necessary). But GCC isn't LLVM
>> > and this feels like getting out of hand?
>>
>> We had an OK from Richard Sandiford on the earlier (v5) version with
>> v6 just fixing an obvious bug... so I was about to merge this earlier
>> just when you commented.
>>
>> Given that this had months of test exposure on our end, I would prefer
>> to move this forward for GCC14 in its current form.
>> The project of replacing architecture-specific store-forwarding passes
>> with a generalized infrastructure could then be addressed in the GCC15
>> timeframe (or beyond)?
>
> It's up to target maintainers, I just picked this pass (randomly) to make this
> comment (of course also knowing that STLF fails are a common issue on
> pipelined uarchs).
I agree there's scope for making some of this target-independent.
One vague thing I've been wondering about is whether, for some passes
like these, we should use inheritance rather than target hooks. So in
this case, the target-independent code would provide a framework for
iterating over the function and testing for forwarding, but the target
would ultimately decide what to do with that information. This would
also make it easier for targets to add genuinely target-specific
information to the bookkeeping structures.
In case it sounds otherwise, that's supposed to be more than
just a structural C++-vs-C thing. The idea is that we'd have
a pass for "resolving store forwarding-related problems",
but the specific goals would be mostly (or at least partially)
target-specific rather than target-independent.
I'd wondered the same thing about the early-ra pass that we're
adding for SME. Some of the framework could be generalised and
made target-independent, but the main purpose of the pass (using
strided registers with certain patterns and constraints) is highly
target-specific.
Thanks,
Richard
More information about the Gcc-patches
mailing list