This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Jakub Jelinek <jakub at redhat dot com>, Richard Biener <richard dot guenther at gmail dot com>
- Cc: nd <nd at arm dot com>, "mliska at suse dot cz" <mliska at suse dot cz>, "ubizjak at gmail dot com" <ubizjak at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, "marc dot glisse at inria dot fr" <marc dot glisse at inria dot fr>, "H.J. Lu" <hjl dot tools at gmail dot com>, Jan Hubicka <hubicka at ucw dot cz>
- Date: Thu, 12 Apr 2018 15:53:13 +0000
- Subject: Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
- Nodisclaimer: True
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
>> Not sure if I missed some important part of the discussion but
>> for the testcase we want to preserve the tailcall, right? So
>> it would be enough to set avoid_libcall to
>> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
>> stpcpy)?
The tailcall issue is just a distraction. Historically the handling of mempcpy
has been horribly inefficient in both GCC and GLIBC for practically all targets.
This is why it was decided to defer to memcpy.
For example small constant mempcpy was not expanded inline like memcpy
until PR70140 was fixed. Except for a few targets which have added an
optimized mempcpy, the default mempcpy implementation in almost all
released GLIBCs is much slower than memcpy (due to using a badly written
C implementation).
Recent GLIBCs now call the optimized memcpy - this is better but still adds
extra call/return overheads. So to improve that the GLIBC headers have an
inline that changes any call to mempcpy into memcpy (this is the default but
can be disabled on a per-target basis).
Obviously it is best to do this optimization in GCC, which is what we finally do
in GCC8. Inlining mempcpy means you sometimes miss a tailcall, but this is
not common - in all of GLIBC the inlining on AArch64 adds 166 extra instructions
and 12 callee-save registers. This is a small codesize cost to avoid the overhead
of calling the generic C version.
> My preference would be to have non-lame mempcpy etc. on all targets, but the
> aarch64 folks disagree.
The question is who is going to write the 30+ mempcpy implementations for all
those targets which don't have one? And who says doing this is actually going to
improve performance? Having mempcpy+memcpy typically means more Icache
misses in code that uses both.
So generally it's a good idea to change mempcpy into memcpy by default. It's
not slower than calling mempcpy even if you have a fast implementation, it's faster
if you use an up to date GLIBC which calls memcpy, and it's significantly better
when using an old GLIBC.
Wilco