This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/68261] New: GCC needs to use optimized version of memcpy
- From: "geir at cray dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 09 Nov 2015 21:56:11 +0000
- Subject: [Bug target/68261] New: GCC needs to use optimized version of memcpy
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68261
Bug ID: 68261
Summary: GCC needs to use optimized version of memcpy
Product: gcc
Version: 5.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: geir at cray dot com
Target Milestone: ---
The memcpy routine for GCC needs to be faster. The following test case shows
that the Intel compiler implementation of memcpy is over twice as fast as GCC.
I realize that memcpy is a part of GLIBC, but the GCC compiler should take
advantage of the targetting information being provided and the context of the
memcpy in order to provide more optimal code:
$ cat test_memcpy.cpp
#include <stdio.h>
#include <string.h>
#include <omp.h>
#include <vector>
extern "C" void memcpy_custom(double* out, double* in, int length);
int main(int argn, char** argv)
{
int repeat = 200;
int N = (1 << 20);
std::vector<double> inp(N, 1);
std::vector<double> out(N, 2);
double t = -omp_get_wtime();
if (argn == 1)
{
for (int i = 0; i < repeat; i++)
{
memcpy(&out[0], &inp[0], N * sizeof(double));
}
}
else
{
for (int i = 0; i < repeat; i++)
{
memcpy_custom(&out[0], &inp[0], N);
}
}
t += omp_get_wtime();
printf("performance: %.4f MB/sec.\n", repeat * N * sizeof(double) / t / (1
<< 20));
}
$ cat memcpy_custom.cpp
extern "C" void memcpy_custom(double* out, double* in, int length)
{
for (int i = 0; i < length; i++) out[i] = in[i];
}
$
GCC g++ performance:
$ g++ --version
g++ (GCC) 5.1.0 20150422 (Cray Inc.)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ g++ -march=corei7-avx -o gcc.out -O3 -fopenmp memcpy_custom.cpp
test_memcpy.cpp
$ ./gcc.out
performance: 6977.5857 MB/sec.
$
Intel icpc performance:
$ icpc --version
icpc (ICC) 15.0.3 20150407
Copyright (C) 1985-2015 Intel Corporation. All rights reserved.
$ icpc -mavx -o intel.out -O3 -qopenmp memcpy_custom.cpp test_memcpy.cpp
$ ./intel.out
performance: 13055.0563 MB/sec.
$
Performance of GCC can be improved by implementing a simple "custom"
version of memcpy:
$ ./gcc.out 1
performance: 11619.4630 MB/sec.
$ ./intel.out 1
performance: 13068.3777 MB/sec.
$