This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/68261] New: GCC needs to use optimized version of memcpy

From: "geir at cray dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 09 Nov 2015 21:56:11 +0000
Subject: [Bug target/68261] New: GCC needs to use optimized version of memcpy
Auto-submitted: auto-generated

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68261

            Bug ID: 68261
           Summary: GCC needs to use optimized version of memcpy
           Product: gcc
           Version: 5.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: geir at cray dot com
  Target Milestone: ---

The memcpy routine for GCC needs to be faster.  The following test case shows
that the Intel compiler implementation of memcpy is over twice as fast as GCC. 
I realize that memcpy is a part of GLIBC, but the GCC compiler should take
advantage of the targetting information being provided and the context of the
memcpy in order to provide more optimal code:

$ cat test_memcpy.cpp                                         
#include <stdio.h>                                                              
#include <string.h>                                                             
#include <omp.h>                                                                
#include <vector>                                                               

extern "C" void memcpy_custom(double* out, double* in, int length);

int main(int argn, char** argv)
{                              
    int repeat = 200;          
    int N = (1 << 20);         
    std::vector<double> inp(N, 1);
    std::vector<double> out(N, 2);

    double t = -omp_get_wtime();
    if (argn == 1)              
    {                           
        for (int i = 0; i < repeat; i++)
        {                               
            memcpy(&out[0], &inp[0], N * sizeof(double));
        }                                                
    }                                                    
    else                                                 
    {                                                    
        for (int i = 0; i < repeat; i++)                 
        {                                                
            memcpy_custom(&out[0], &inp[0], N);          
        }                                                
    }                                                    
    t += omp_get_wtime();                                

    printf("performance: %.4f MB/sec.\n", repeat * N * sizeof(double) / t / (1
<< 20)); 
}                                                                               
$ cat memcpy_custom.cpp                                
extern "C" void memcpy_custom(double* out, double* in, int length)              
{                                                                               
    for (int i = 0; i < length; i++) out[i] = in[i];                            
} 
$

      GCC g++ performance:

$ g++ --version
g++ (GCC) 5.1.0 20150422 (Cray Inc.)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++ -march=corei7-avx -o gcc.out -O3 -fopenmp memcpy_custom.cpp
test_memcpy.cpp
$  ./gcc.out
performance: 6977.5857 MB/sec.
$


     Intel icpc performance:

$ icpc --version
icpc (ICC) 15.0.3 20150407
Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.

$ icpc -mavx -o intel.out -O3 -qopenmp memcpy_custom.cpp test_memcpy.cpp
$ ./intel.out                                   
performance: 13055.0563 MB/sec.
$


    Performance of GCC can be improved by implementing a simple "custom"
version of memcpy:

$ ./gcc.out 1                                   
performance: 11619.4630 MB/sec. 
$ ./intel.out 1 
performance: 13068.3777 MB/sec. 
$

Follow-Ups:
- [Bug target/68261] GCC needs to use optimized version of memcpy
  - From: pinskia at gcc dot gnu.org
- [Bug target/68261] GCC needs to use optimized version of memcpy
  - From: rguenth at gcc dot gnu.org
- [Bug target/68261] GCC needs to use optimized version of memcpy
  - From: manu at gcc dot gnu.org
- [Bug target/68261] GCC needs to use optimized version of memcpy
  - From: geir at cray dot com
- [Bug target/68261] GCC needs to use optimized version of memcpy
  - From: pinskia at gcc dot gnu.org
- [Bug target/68261] GCC needs to use optimized version of memcpy
  - From: geir at cray dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]