[PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Andrew Pinski pinskia@gmail.com
Mon Aug 16 18:56:05 GMT 2021


On Mon, Aug 16, 2021 at 10:10 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Mon, 16 Aug 2021 09:29:16 PDT (-0700), Kito Cheng wrote:
> >> > Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
> >> > testcase from v2 and add a few more comments to describe the field?
> >> >
> >> > And add an -mtune=ultra-size to make it able to test without change
> >> > other behavior?
> >> >
> >> > Hi Palmer:
> >> >
> >> > Are you OK with that?
> >>
> >> I'm still not convinced on the performance: like Andrew and I pointed
> >> out, this is a difficult case for pipelines of this flavor to handle.
> >> Nobody here knows anything about this pipeline deeply enough to say
> >> anything difinitive, though, so this is really just a guess.
> >
> > So with an extra field to indicate should resolve that?
> > I believe people should only set overlap_op_by_pieces
> > to true only if they are sure it has benefits.
>
> My only issue there is that we'd have no way to turn it on, but see
> below...
>
> >> As I'm not convinced this is an obvious performance win I'm not going to
> >> merge it without a benchmark.  If you're convinced and want to merge it
> >> that's fine, I don't really care about the performance fo the C906 and
> >> if someone complains we can always just revert it later.
> >
> > I suppose Christoph has tried with their internal processor, and it's
> > benefit on performance,
> > but it can't be open-source yet, so v2 patch set using C906 to demo
> > and test that since that is
> > the only processor with slow_unaligned_access=False.
>
> Well, that's a very different discussion.  The C906 tuning model should
> be for the C906, not a proxy for some internal-only processor.  If the
> goal here is to allow this pass to be flipped on by an out-of-tree
> pipeline model then we can talk about it.
>
> > I agree on the C906 part, we never know it's benefit or not, so I propose
> > adding one -mtune=ultra-size to make this test-able rather than changing C906.
>
> That's essentially the same conclusion we came to last time this came
> up, except that we were calling it "-Oz" (because LLVM does).  I guess
> we never got around having the broader GCC discussion about "-Oz".  IIRC
> we had some other "-Oz" candidates we never got around to dealing with,
> but that was a while ago so I'm not sure if any of that panned out.

-Oz was a bad idea that Apple came up because GCC decided to start
emitting store multiple on PowerPC around 13 years ago.
I don't think we should repeat that mistake for GCC and especially for RISCV.
If people want to optimize for size, they get the performance issues.

Thanks,
Andrew Pinski


More information about the Gcc-patches mailing list