In the code below, gcc fails to optimize a hand-crafted memcpy as a call to the memcpy function. (or perhaps as "rep;movsb" when compiling with "-Os") I've tried "-O2", "-O3", and "-Os". There are probably a great number of similar optimizations that could be done. This optimization is not unheard of; it is done by Visual Studio 2005. IMHO it is not really any more insane than autovectorization. junk 0 $ cat foo.c void *foo(void *restrict, const void *restrict, unsigned long); void *foo(void *restrict dst, const void *restrict src, unsigned long n) { const char *p = src; char *q = dst; while (n--) { *q++ = *p++; } return dst; } junk 0 $ gcc -m32 -std=gnu99 -fomit-frame-pointer -Os -S foo.c junk 0 $ cat foo.s .file "foo.c" .text .globl foo .type foo, @function foo: pushl %ebx movl 16(%esp), %ebx movl 8(%esp), %ecx movl 12(%esp), %edx jmp .L2 .L3: movb -1(%edx), %al movb %al, -1(%ecx) .L2: decl %ebx incl %ecx incl %edx cmpl $-1, %ebx jne .L3 movl 8(%esp), %eax popl %ebx ret .size foo, .-foo .ident "GCC: (GNU) 4.1.1 20060828 (Red Hat 4.1.1-20)" .section .note.GNU-stack,"",@progbits junk 0 $ gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.1 20060828 (Red Hat 4.1.1-20) junk 0 $
Yes this is known issue. http://www.gccsummit.org/2006/view_abstract.php?content_key=27
Also mentioned in with results also: http://www.gccsummit.org/2006/2006-GCC-Summit-Proceedings.pdf
Fixed for GCC 4.8 with -O3 or -ftree-loop-distribute-patterns. See also PR53081.