[PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Kito Cheng kito.cheng@gmail.com
Mon Aug 16 16:29:16 GMT 2021

> > Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
> > testcase from v2 and add a few more comments to describe the field?
> >
> > And add an -mtune=ultra-size to make it able to test without change
> > other behavior?
> >
> > Hi Palmer:
> >
> > Are you OK with that?
> I'm still not convinced on the performance: like Andrew and I pointed
> out, this is a difficult case for pipelines of this flavor to handle.
> Nobody here knows anything about this pipeline deeply enough to say
> anything difinitive, though, so this is really just a guess.

So with an extra field to indicate should resolve that?
I believe people should only set overlap_op_by_pieces
to true only if they are sure it has benefits.

> As I'm not convinced this is an obvious performance win I'm not going to
> merge it without a benchmark.  If you're convinced and want to merge it
> that's fine, I don't really care about the performance fo the C906 and
> if someone complains we can always just revert it later.

I suppose Christoph has tried with their internal processor, and it's
benefit on performance,
but it can't be open-source yet, so v2 patch set using C906 to demo
and test that since that is
the only processor with slow_unaligned_access=False.

I agree on the C906 part, we never know it's benefit or not, so I propose
adding one -mtune=ultra-size to make this test-able rather than changing C906.

More information about the Gcc-patches mailing list