Bug 95886 - suboptimal memcpy with embedded zero bytes
Summary: suboptimal memcpy with embedded zero bytes
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: Martin Sebor
URL:
Keywords: missed-optimization, patch
Depends on:
Blocks:
 
Reported: 2020-06-24 21:14 UTC by Martin Sebor
Modified: 2020-10-16 11:37 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-06-24 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Sebor 2020-06-24 21:14:20 UTC
While testing the fix for pr95189 I noticed that the memcpy expansion into copy-by-pieces is less than optimal for sequences containing embedded null bytes.  For example, in the test case below, the memcpy call in f() is expanded into what looks like a more efficient sequence than the equivalent memcpy call in g().  The only difference between the two is that the former copies a sequence of non-zero bytes while among the bytes copied by the latter is a null byte.  Clang emits the same code for g() as GCC does for f().

$ cat z.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout -o/dev/stdout z.c
const char a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
const char b[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8 };

void f (void *d)
{
  __builtin_memcpy (d, a, 9);   // optimal
}

void g (void *d)
{
  __builtin_memcpy (d, b, 9);   // suboptimal
}


	.file	"z.c"
	.text

;; Function f (f, funcdef_no=0, decl_uid=1932, cgraph_uid=1, symbol_order=2)

f (void * d)
{
  <bb 2> [local count: 1073741824]:
  __builtin_memcpy (d_2(D), &a, 9); [tail call]
  return;

}


	.p2align 4
	.globl	f
	.type	f, @function
f:
.LFB0:
	.cfi_startproc
	movabsq	$578437695752307201, %rax
	movb	$9, 8(%rdi)
	movq	%rax, (%rdi)
	ret
	.cfi_endproc
.LFE0:
	.size	f, .-f

;; Function g (g, funcdef_no=1, decl_uid=1935, cgraph_uid=2, symbol_order=3)

g (void * d)
{
  <bb 2> [local count: 1073741824]:
  __builtin_memcpy (d_2(D), &b, 9); [tail call]
  return;

}


	.p2align 4
	.globl	g
	.type	g, @function
g:
.LFB1:
	.cfi_startproc
	movq	b(%rip), %rax
	movq	%rax, (%rdi)
	movzbl	b+8(%rip), %eax
	movb	%al, 8(%rdi)
	ret
	.cfi_endproc
.LFE1:
	.size	g, .-g
	.globl	b
	.section	.rodata
	.align 8
	.type	b, @object
	.size	b, 10
b:
	.string	""
	.string	"\001\002\003\004\005\006\007\b"
	.globl	a
	.align 8
	.type	a, @object
	.size	a, 10
a:
	.string	"\001\002\003\004\005\006\007\b\t"
	.ident	"GCC: (GNU) 10.1.1 20200527"
	.section	.note.GNU-stack,"",@progbits
Comment 1 Martin Sebor 2020-06-24 23:32:11 UTC
I'm testing a fix.
Comment 2 Martin Sebor 2020-06-24 23:36:18 UTC
See also pr95887 for a related problem with memcmp.
Comment 4 CVS Commits 2020-07-20 18:09:21 UTC
The master branch has been updated by Martin Sebor <msebor@gcc.gnu.org>:

https://gcc.gnu.org/g:d5803b9876b3d11c93d1a10fabb3fbb1c4a14bd6

commit r11-2231-gd5803b9876b3d11c93d1a10fabb3fbb1c4a14bd6
Author: Martin Sebor <msebor@redhat.com>
Date:   Mon Jul 20 12:06:18 2020 -0600

    Correct handling of constant representations containing embedded nuls.
    
    Resolves:
    PR middle-end/95189 - memcmp being wrongly stripped like strcm
    PR middle-end/95886 - suboptimal memcpy with embedded zero bytes
    
    gcc/ChangeLog:
    
            PR middle-end/95189
            PR middle-end/95886
            * builtins.c (inline_expand_builtin_string_cmp): Rename...
            (inline_expand_builtin_bytecmp): ...to this.
            (builtin_memcpy_read_str): Don't expect data to be nul-terminated.
            (expand_builtin_memory_copy_args): Handle object representations
            with embedded nul bytes.
            (expand_builtin_memcmp): Same.
            (expand_builtin_strcmp): Adjust call to naming change.
            (expand_builtin_strncmp): Same.
            * expr.c (string_constant): Create empty strings with nonzero size.
            * fold-const.c (c_getstr): Rename locals and update comments.
            * tree.c (build_string): Accept null pointer argument.
            (build_string_literal): Same.
            * tree.h (build_string): Provide a default.
            (build_string_literal): Same.
    
    gcc/testsuite/ChangeLog:
    
            PR middle-end/95189
            PR middle-end/95886
            * gcc.dg/memcmp-pr95189.c: New test.
            * gcc.dg/strncmp-3.c: New test.
            * gcc.target/i386/memcpy-pr95886.c: New test.
Comment 5 Martin Sebor 2020-07-20 18:10:55 UTC
Fixed in r11-2231.
Comment 6 CVS Commits 2020-07-23 20:10:25 UTC
The master branch has been updated by Martin Sebor <msebor@gcc.gnu.org>:

https://gcc.gnu.org/g:8598657c607500512075f6c4ee3b10460c94903d

commit r11-2298-g8598657c607500512075f6c4ee3b10460c94903d
Author: Martin Sebor <msebor@redhat.com>
Date:   Thu Jul 23 14:08:22 2020 -0600

    Restrict test to LP64.
    
    gcc/testsuite/ChangeLog:
    
            PR testsuite/95886
            * gcc.target/i386/memcpy-pr95886.c: Restrict test to LP64.
Comment 7 CVS Commits 2020-07-24 02:28:11 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:b0e5ec934e7a7473275326e2aee58eaf252cdff1

commit r11-2301-gb0e5ec934e7a7473275326e2aee58eaf252cdff1
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Jul 23 19:14:06 2020 -0700

    Restrict PR middle-end/95886 x86 test to !ia32
    
    Since gcc.target/i386/memcpy-pr95886.c requires 64-bit register, restrict
    it to !ia32.
    
            PR middle-end/95886
            * gcc.target/i386/memcpy-pr95886.c: Restrict test to !ia32.
Comment 8 CVS Commits 2020-10-08 18:38:54 UTC
The releases/gcc-10 branch has been updated by Martin Sebor <msebor@gcc.gnu.org>:

https://gcc.gnu.org/g:e4c9aac98611f63847ef6c57916808d9a2d7abcb

commit r10-8871-ge4c9aac98611f63847ef6c57916808d9a2d7abcb
Author: Martin Sebor <msebor@redhat.com>
Date:   Thu Oct 8 12:35:01 2020 -0600

    Correct handling of constant representations containing embedded nuls (backport from trunk).
    
    Resolves:
    PR middle-end/95189 - memcmp being wrongly stripped like strcm
    PR middle-end/95886 - suboptimal memcpy with embedded zero bytes
    
    gcc/ChangeLog:
    
            PR middle-end/95189
            PR middle-end/95886
            * builtins.c (inline_expand_builtin_string_cmp): Rename...
            (inline_expand_builtin_bytecmp): ...to this.
            (builtin_memcpy_read_str): Don't expect data to be nul-terminated.
            (expand_builtin_memory_copy_args): Handle object representations
            with embedded nul bytes.
            (expand_builtin_memcmp): Same.
            (expand_builtin_strcmp): Adjust call to naming change.
            (expand_builtin_strncmp): Same.
            * expr.c (string_constant): Create empty strings with nonzero size.
            * fold-const.c (c_getstr): Rename locals and update comments.
            * tree.c (build_string): Accept null pointer argument.
            (build_string_literal): Same.
            * tree.h (build_string): Provide a default.
            (build_string_literal): Same.
    
    gcc/testsuite/ChangeLog:
    
            PR middle-end/95189
            PR middle-end/95886
            * gcc.dg/memcmp-pr95189.c: New test.
            * gcc.dg/strncmp-3.c: New test.
            * gcc.target/i386/memcpy-pr95886.c: New test.
Comment 9 CVS Commits 2020-10-16 11:37:45 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:75a016e24a335db9c93ccd2478433adfe773d898

commit r10-8903-g75a016e24a335db9c93ccd2478433adfe773d898
Author: Martin Sebor <msebor@redhat.com>
Date:   Thu Jul 23 14:08:22 2020 -0600

    Restrict test to LP64.
    
    gcc/testsuite/ChangeLog:
    
            PR testsuite/95886
            * gcc.target/i386/memcpy-pr95886.c: Restrict test to LP64.
    
    (cherry picked from commit 8598657c607500512075f6c4ee3b10460c94903d)
Comment 10 CVS Commits 2020-10-16 11:37:50 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:0cf293649b6c495249854133159ef66bf9c43904

commit r10-8904-g0cf293649b6c495249854133159ef66bf9c43904
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Jul 23 19:14:06 2020 -0700

    Restrict PR middle-end/95886 x86 test to !ia32
    
    Since gcc.target/i386/memcpy-pr95886.c requires 64-bit register, restrict
    it to !ia32.
    
            PR middle-end/95886
            * gcc.target/i386/memcpy-pr95886.c: Restrict test to !ia32.
    
    (cherry picked from commit b0e5ec934e7a7473275326e2aee58eaf252cdff1)