This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, nd <nd at arm dot com>, "mliska at suse dot cz" <mliska at suse dot cz>, "ubizjak at gmail dot com" <ubizjak at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, "marc dot glisse at inria dot fr" <marc dot glisse at inria dot fr>, "H.J. Lu" <hjl dot tools at gmail dot com>, Jan Hubicka <hubicka at ucw dot cz>
- Date: Thu, 12 Apr 2018 17:29:35 +0000
- Subject: Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
- Nodisclaimer: True
- References: <DB6PR0801MB205332C2713586CB5120F0AE83BC0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <20180412160306.GN8577@tucnak> <DB6PR0801MB2053E09490188FF5007227B083BC0@DB6PR0801MB2053.eurprd08.prod.outlook.com>,<20180412164917.GO8577@tucnak>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Jakub Jelinek wrote:
>On Thu, Apr 12, 2018 at 04:30:07PM +0000, Wilco Dijkstra wrote:
>> Jakub Jelinek wrote:
>> Frankly I don't see why it is a P1 regression. Do you have a benchmark that
>
>That is how regression priorities are defined.
How can one justify considering this a release blocker without hard numbers?
If this is a 1% regression on a large body of code it would be very serious, if 0.01% -
not so much.
>> >> So generally it's a good idea to change mempcpy into memcpy by default. It's
>> >> not slower than calling mempcpy even if you have a fast implementation, it's faster
>> >> if you use an up to date GLIBC which calls memcpy, and it's significantly better
>> >> when using an old GLIBC.
>> >
>> > mempcpy is quite good on many targets even in old GLIBCs.
>>
>> Only true if with "many" you mean x86, x86_64 and IIRC sparc.
>
> Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added,
> in 2013 other power versions, in 2016 s390*, etc. Doing a decent mempcpy
> isn't hard if you have asm version of memcpy and one spare register.
More mempcpy implementations have been added in recent years indeed, but almost all
add an extra copy of the memcpy code rather than using a single combined implementation.
That means it is still better to call memcpy (which is frequently used and thus likely in L1/L2)
rather than mempcpy (which is more likely to be cold and thus not cached).
Wilco