This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Martin Liška <mliska at suse dot cz>, Uros Bizjak <ubizjak at gmail dot com>, gcc-patches at gcc dot gnu dot org, Marc Glisse <marc dot glisse at inria dot fr>, "H.J. Lu" <hjl dot tools at gmail dot com>, Jan Hubicka <hubicka at ucw dot cz>
- Date: Thu, 12 Apr 2018 16:05:49 +0200
- Subject: Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
- References: <adc4fa95-1f8e-67ae-ffeb-81c1f239674b@suse.cz> <20180328163652.GL8577@tucnak> <772b1171-2321-67d9-85e7-358a5cad0efa@suse.cz> <20180329122532.GP8577@tucnak> <17bbc039-e511-4fbe-d534-3d6d21aadc00@suse.cz> <2d812eaf-8ea0-68e8-089b-0c3d89a203d8@suse.cz> <20180410091915.GA8577@tucnak> <fbd9f1ef-34c6-45e1-b5ae-5acb3b828788@suse.cz> <5b750aa0-c5f6-0e64-9a14-5667926bcf3f@suse.cz> <alpine.LSU.2.20.1804121536260.18265@zhemvz.fhfr.qr>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> Not sure if I missed some important part of the discussion but
> for the testcase we want to preserve the tailcall, right? So
> it would be enough to set avoid_libcall to
> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> stpcpy)?
For the testcase yes. There the question is if some targets have so lame
mempcpy that using a tailcall to mempcpy is slower over avoiding the
tailcall (and on aarch64 it looked like maintainer's choice to have lame
mempcpy and hope the compiler will avoid it at all costs). On the other
side, that change has been forced over to all targets, even when they don't
have lame mempcpy.
So, the tailcall is one issue, and we can either use mempcpy if endp
and CALL_EXPR_TAILCALL, or only do that if -Os.
And another issue is mempcpy uses in other contexts, here again I think x86
has good enough mempcpy that if I use
foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call,
but not so on targets with lame mempcpy.
My preference would be to have non-lame mempcpy etc. on all targets, but the
aarch64 folks disagree.
So, wonder e.g. about Martin's patch, which would use mempcpy if endp and
either FAST_SPEED for mempcpy (regardless of the context), or not
SLOW_SPEED and CALL_EXPR_TAILCALL. That way, targets could signal they have
so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask
for it to be used every time it makes sense from caller POV, and have the
default something in between (only use it in tail calls).
Jakub