[libc-coord] Add new ABI '__memcmpeq()' to libc
Noah Goldstein
goldstein.w.n@gmail.com
Thu Sep 16 23:24:02 GMT 2021
On Thu, Sep 16, 2021 at 5:25 PM Chris Kennelly via Libc-alpha <
libc-alpha@sourceware.org> wrote:
> On Thu, Sep 16, 2021 at 5:50 PM enh <enh@google.com> wrote:
>
> > plus testing for _equality_ can (as mentioned earlier) have slightly
> > different properties from the three-way comparator behavior of
> > bcmp()/memcmp().
> >
>
> llvm-libc's implementation only returns the boolean, though.
>
> The mem* functions are extremely sensitive to instruction cache effects, so
> having 3 unique implementations (__memcmpeq, bcmp, memcmp) that do similar,
> but subtly different things can be a hidden performance cost--one that is
> hard to demonstrate with a microbenchmark. Our experience developing
> optimized mem* routines ended up showing better performance in actual
> applications when we accepted seemingly worse microbenchmark performance by
> optimizing for code footprint instead (more extensive notes for mem* in
> general
> <
> https://storage.googleapis.com/pub-tools-public-publication-data/pdf/4f7c3da72d557ed418828823a8e59942859d677f.pdf
> >
> and
> memcmp specifically (section 4.4)
> <
> https://storage.googleapis.com/pub-tools-public-publication-data/pdf/e52f61fd2c51e8962305120548581efacbc06ffc.pdf
> >
> ).
>
Regarding the code bloat found in memcmp in the paper, I think that is
pretty
exclusive to the sse4 implementation:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-sse4.S;h=b82adcd5fab5b60a0327819f6041a689a276916a;hb=HEAD
And I think there is a fair argument to not include a __memcmpeq() based on
that implementation.
The older versions:
sse2:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/memcmp.S;h=870e15c5a080162b336b13bac24cf7afbac6874b;hb=HEAD
avx2:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S;h=2621ec907aedb781fcf0444e831c801f18fa68ba;hb=HEAD
evex:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-evex-movbe.S;h=654dc7ac8ccb9445b2c7107a7cf2d9f6ce4b1010;hb=HEAD
Have a much more reasonable code size footprint.
Also the __memcmpeq() code will itself have a smaller code size footprint
that memcmp()
With the implementations from my patch the code size is shrunk the
following:
sse2: -66
avx2: -436
avx2: -500
> The alternative would be to alias (as the NOTES suggest as a possible
> implementation), but I think that raises James' question of why not just
> use bcmp? Dependencies on non-boolean implementations of bcmp seem
> rare--namely, I haven't actually seen one.
>
>
> > On Thu, Sep 16, 2021 at 2:43 PM Joseph Myers <joseph@codesourcery.com>
> > wrote:
> >
> >> On Thu, 16 Sep 2021, James Y Knight wrote:
> >>
> >> > Wouldn't it be far simpler to just un-deprecate bcmp?
> >>
> >> The aim is to have something to which calls can be generated in all
> >> standards modes. bcmp has never been part of ISO C; there's nothing to
> >> undeprecate there.
> >
> >
> >> --
> >> Joseph S. Myers
> >> joseph@codesourcery.com
> >>
> >
>
More information about the Gcc
mailing list