While testing the fix for pr95189 I noticed that the memcpy expansion into copy-by-pieces is less than optimal for sequences containing embedded null bytes. For example, in the test case below, the memcpy call in f() is expanded into what looks like a more efficient sequence than the equivalent memcpy call in g(). The only difference between the two is that the former copies a sequence of non-zero bytes while among the bytes copied by the latter is a null byte. Clang emits the same code for g() as GCC does for f(). $ cat z.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout -o/dev/stdout z.c const char a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 }; const char b[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8 }; void f (void *d) { __builtin_memcpy (d, a, 9); // optimal } void g (void *d) { __builtin_memcpy (d, b, 9); // suboptimal } .file "z.c" .text ;; Function f (f, funcdef_no=0, decl_uid=1932, cgraph_uid=1, symbol_order=2) f (void * d) { <bb 2> [local count: 1073741824]: __builtin_memcpy (d_2(D), &a, 9); [tail call] return; } .p2align 4 .globl f .type f, @function f: .LFB0: .cfi_startproc movabsq $578437695752307201, %rax movb $9, 8(%rdi) movq %rax, (%rdi) ret .cfi_endproc .LFE0: .size f, .-f ;; Function g (g, funcdef_no=1, decl_uid=1935, cgraph_uid=2, symbol_order=3) g (void * d) { <bb 2> [local count: 1073741824]: __builtin_memcpy (d_2(D), &b, 9); [tail call] return; } .p2align 4 .globl g .type g, @function g: .LFB1: .cfi_startproc movq b(%rip), %rax movq %rax, (%rdi) movzbl b+8(%rip), %eax movb %al, 8(%rdi) ret .cfi_endproc .LFE1: .size g, .-g .globl b .section .rodata .align 8 .type b, @object .size b, 10 b: .string "" .string "\001\002\003\004\005\006\007\b" .globl a .align 8 .type a, @object .size a, 10 a: .string "\001\002\003\004\005\006\007\b\t" .ident "GCC: (GNU) 10.1.1 20200527" .section .note.GNU-stack,"",@progbits
I'm testing a fix.
See also pr95887 for a related problem with memcmp.
Patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549225.html
The master branch has been updated by Martin Sebor <msebor@gcc.gnu.org>: https://gcc.gnu.org/g:d5803b9876b3d11c93d1a10fabb3fbb1c4a14bd6 commit r11-2231-gd5803b9876b3d11c93d1a10fabb3fbb1c4a14bd6 Author: Martin Sebor <msebor@redhat.com> Date: Mon Jul 20 12:06:18 2020 -0600 Correct handling of constant representations containing embedded nuls. Resolves: PR middle-end/95189 - memcmp being wrongly stripped like strcm PR middle-end/95886 - suboptimal memcpy with embedded zero bytes gcc/ChangeLog: PR middle-end/95189 PR middle-end/95886 * builtins.c (inline_expand_builtin_string_cmp): Rename... (inline_expand_builtin_bytecmp): ...to this. (builtin_memcpy_read_str): Don't expect data to be nul-terminated. (expand_builtin_memory_copy_args): Handle object representations with embedded nul bytes. (expand_builtin_memcmp): Same. (expand_builtin_strcmp): Adjust call to naming change. (expand_builtin_strncmp): Same. * expr.c (string_constant): Create empty strings with nonzero size. * fold-const.c (c_getstr): Rename locals and update comments. * tree.c (build_string): Accept null pointer argument. (build_string_literal): Same. * tree.h (build_string): Provide a default. (build_string_literal): Same. gcc/testsuite/ChangeLog: PR middle-end/95189 PR middle-end/95886 * gcc.dg/memcmp-pr95189.c: New test. * gcc.dg/strncmp-3.c: New test. * gcc.target/i386/memcpy-pr95886.c: New test.
Fixed in r11-2231.
The master branch has been updated by Martin Sebor <msebor@gcc.gnu.org>: https://gcc.gnu.org/g:8598657c607500512075f6c4ee3b10460c94903d commit r11-2298-g8598657c607500512075f6c4ee3b10460c94903d Author: Martin Sebor <msebor@redhat.com> Date: Thu Jul 23 14:08:22 2020 -0600 Restrict test to LP64. gcc/testsuite/ChangeLog: PR testsuite/95886 * gcc.target/i386/memcpy-pr95886.c: Restrict test to LP64.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:b0e5ec934e7a7473275326e2aee58eaf252cdff1 commit r11-2301-gb0e5ec934e7a7473275326e2aee58eaf252cdff1 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Jul 23 19:14:06 2020 -0700 Restrict PR middle-end/95886 x86 test to !ia32 Since gcc.target/i386/memcpy-pr95886.c requires 64-bit register, restrict it to !ia32. PR middle-end/95886 * gcc.target/i386/memcpy-pr95886.c: Restrict test to !ia32.
The releases/gcc-10 branch has been updated by Martin Sebor <msebor@gcc.gnu.org>: https://gcc.gnu.org/g:e4c9aac98611f63847ef6c57916808d9a2d7abcb commit r10-8871-ge4c9aac98611f63847ef6c57916808d9a2d7abcb Author: Martin Sebor <msebor@redhat.com> Date: Thu Oct 8 12:35:01 2020 -0600 Correct handling of constant representations containing embedded nuls (backport from trunk). Resolves: PR middle-end/95189 - memcmp being wrongly stripped like strcm PR middle-end/95886 - suboptimal memcpy with embedded zero bytes gcc/ChangeLog: PR middle-end/95189 PR middle-end/95886 * builtins.c (inline_expand_builtin_string_cmp): Rename... (inline_expand_builtin_bytecmp): ...to this. (builtin_memcpy_read_str): Don't expect data to be nul-terminated. (expand_builtin_memory_copy_args): Handle object representations with embedded nul bytes. (expand_builtin_memcmp): Same. (expand_builtin_strcmp): Adjust call to naming change. (expand_builtin_strncmp): Same. * expr.c (string_constant): Create empty strings with nonzero size. * fold-const.c (c_getstr): Rename locals and update comments. * tree.c (build_string): Accept null pointer argument. (build_string_literal): Same. * tree.h (build_string): Provide a default. (build_string_literal): Same. gcc/testsuite/ChangeLog: PR middle-end/95189 PR middle-end/95886 * gcc.dg/memcmp-pr95189.c: New test. * gcc.dg/strncmp-3.c: New test. * gcc.target/i386/memcpy-pr95886.c: New test.
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:75a016e24a335db9c93ccd2478433adfe773d898 commit r10-8903-g75a016e24a335db9c93ccd2478433adfe773d898 Author: Martin Sebor <msebor@redhat.com> Date: Thu Jul 23 14:08:22 2020 -0600 Restrict test to LP64. gcc/testsuite/ChangeLog: PR testsuite/95886 * gcc.target/i386/memcpy-pr95886.c: Restrict test to LP64. (cherry picked from commit 8598657c607500512075f6c4ee3b10460c94903d)
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:0cf293649b6c495249854133159ef66bf9c43904 commit r10-8904-g0cf293649b6c495249854133159ef66bf9c43904 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Jul 23 19:14:06 2020 -0700 Restrict PR middle-end/95886 x86 test to !ia32 Since gcc.target/i386/memcpy-pr95886.c requires 64-bit register, restrict it to !ia32. PR middle-end/95886 * gcc.target/i386/memcpy-pr95886.c: Restrict test to !ia32. (cherry picked from commit b0e5ec934e7a7473275326e2aee58eaf252cdff1)