[RFC] Introduce -finline-memset-loops

Mon Jan 16 07:26:16 GMT 2023

On Sat, Jan 14, 2023 at 2:55 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> Hello, Richard,
>
> Thank you for the feedback.
>
> On Jan 12, 2023, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > On Tue, Dec 27, 2022 at 5:12 AM Alexandre Oliva via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
>
> >> This patch extends the memset expansion to start with a loop, so as to
> >> still take advantage of known alignment even with long lengths, but
> >> without necessarily adding store blocks for every power of two.
>
> > I wonder if that isn't better handled by targets via the setmem pattern,
>
> That was indeed where I started, but then I found myself duplicating the
> logic in try_store_by_multiple_pieces on a per-target basis.
>
> Target-specific code is great for tight optimizations, but the main
> purpose of this feature is not an optimization.  AFAICT it actually
> slows things down in general (due to code growth, and to conservative
> assumptions about alignment), except perhaps for some microbenchmarks.
> It's rather a means to avoid depending on the C runtime, particularly
> due to compiler-introduced memset calls.

OK, that's what I guessed but you didn't spell out.  So does it make sense
to mention -ffreestanding in the docs at least?  My fear is that we'd get
complaints that -O3 -finline-memset-loops turns nicely optimized memset
loops into dumb ones (via loop distribution and then stupid re-expansion).
So does it also make sense to turn off -floop-distribute-patterns[-memset]
with -finline-memset-loops?

> My initial goal was to be able to show that inline expansion would NOT
> bring about performance improvements, but performance was not the
> concern that led to the request.
>
> If the approach seems generally acceptable, I may even end up extending
> it to other such builtins.  I have a vague recollection that memcmp is
> also an issue for us.

The C/C++ runtime produce at least memmove, memcpy and memcmp as well.
In this respect -finline-memset-loops is too specific and to avoid an explosion
in the number of command line options we should try to come up with sth
better?  -finline-all-stringops[={memset,memcpy,...}] (just like x86 has
-minline-all-stringops)?

> > like x86 has the stringop inline strathegy.  What is considered acceptable
> > in terms of size or performance will vary and I don't think there's much
> > room for improvements on this generic code support?
>
> *nod* x86 is quite finely tuned already; I suppose other targets may
> have some room for additional tuning, both for performance and for code
> size, but we don't have much affordance for avoiding builtin calls to
> the C runtime, which is what this is about.
>
> Sometimes disabling loop distribution is enough to accomplish that, but
> in some cases GNAT itself resorts to builtin memset calls, in ways that
> are not so easy to avoid, and that would ultimately amount to expanding
> memset inline, so I figured we might as well offer that as a general
> feature, for users to whom this matters.
>
> Is (optionally) tending to this (uncommon, I suppose) need (or
> preference?) not something GCC would like to do?

Sure, I think for the specific intended purpose that would be fine.  It should
also only apply to __builtin_memset calls, not to memset calls from user code?

Thanks,
Richard.

> --
> Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
>    Free Software Activist                       GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about <https://stallmansupport.org>