Bug 70140 - Inefficient expansion of __builtin_mempcpy
Summary: Inefficient expansion of __builtin_mempcpy
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: ---
Assignee: Martin Liška
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2016-03-08 12:56 UTC by Wilco
Modified: 2017-09-13 16:16 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-04-18 00:00:00


Attachments
Patch candidate (3.74 KB, patch)
2017-07-17 13:43 UTC, Martin Liška
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wilco 2016-03-08 12:56:28 UTC
The expansion of __builtin_mempcpy is inefficient on many targets (eg. AArch64, ARM, PPC). The issue is due to not using the same expansion options that memcpy uses in builtins.c. As a result GCC6 produces for __builtin_mempcpy(x, y, 32):

PPC:
   0:	38 a0 00 20 	li      r5,32
   4:	48 00 00 00 	b       4 <foo+0x4>
			4: R_PPC_REL24	mempcpy
   8:	60 00 00 00 	nop
   c:	60 42 00 00 	ori     r2,r2,0

AArch64:
        mov     x2, 32
        b       mempcpy

A second issue is that GCC always calls mempcpy. mempcpy is not supported or implemented efficiently in many (if not most) library/target combinations. GLIBC only has 3 targets which implement an optimized mempcpy, so GLIBC currently inlines mempcpy into memcpy by default unless a target explicitly disables this. It seems better to do this in GCC so it works for all libraries.
Comment 1 Martin Liška 2017-04-18 13:33:37 UTC
I'll do that for GCC8.
Comment 2 Martin Liška 2017-07-14 11:52:18 UTC
I've just taken look at that and please confirm that I understand that correctly:

1) we want to ideally a same function for expansion of memcpy and mempcpy, where for later one we'll append calculation of return value (dest + n)?

2) I'm bit confused with 'GLIBC currently inlines mempcpy into memcpy'. Do is mean that you basically do not want to emit any call to mempcpy and prefer rather:

int my_mempcpy(void)
{
  return __builtin_mempcpy (a, b, SIZE);
}

$ ./xgcc -B. /tmp/mem.c -O2 -S  -DSIZE=100000 -o /dev/stdout

my_mempcpy:
.LFB1:
	.cfi_startproc
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	movq	b(%rip), %rsi
	movq	a(%rip), %rdi
	movl	$100000, %edx
	call	memcpy
	addq	$8, %rsp
	.cfi_def_cfa_offset 8
	addq	$100000, %rax
	ret
	.cfi_endproc

?
Thanks
Comment 3 Wilco 2017-07-14 12:52:41 UTC
(In reply to Martin Liška from comment #2)
> I've just taken look at that and please confirm that I understand that
> correctly:
> 
> 1) we want to ideally a same function for expansion of memcpy and mempcpy,
> where for later one we'll append calculation of return value (dest + n)?

Yes, I think the current movmem expander does not deal with the return value so could be reused for mempcpy expansion.

> 2) I'm bit confused with 'GLIBC currently inlines mempcpy into memcpy'. Do
> is mean that you basically do not want to emit any call to mempcpy and
> prefer rather:

Yes. Ignoring GLIBC internals, mempcpy is used quite infrequently. As a result not many targets have added highly optimized implementations. The targets that do have the issue that mempcpy is typically not in the cache when called, while memcpy is more likely already cached.

If there are targets that do not want this, we could add a target define.

Note that I'm not concerned about the corner case of a mempcpy being tailcalled which isn't possible anymore after we optimize mempcpy into memcpy. In almost all targets mempcpy is a C function that calls memcpy, ie. you already have extra overhead on every mempcpy call. See glibc/string/mempcpy.c:

void *
MEMPCPY (void *dest, const void *src, size_t len)
{
  return memcpy (dest, src, len) + len;
}
Comment 4 H.J. Lu 2017-07-14 13:01:02 UTC
(In reply to Wilco from comment #3)
> Yes. Ignoring GLIBC internals, mempcpy is used quite infrequently. As a
> result not many targets have added highly optimized implementations. The
> targets that do have the issue that mempcpy is typically not in the cache
> when called, while memcpy is more likely already cached.
> 

FWIW, in x86-64 glibc, memcpy and mempcpy share the same function
body:

0000000000000010 <__mempcpy>:
  10:	48 89 f8             	mov    %rdi,%rax
  13:	48 01 d0             	add    %rdx,%rax
  16:	eb 1b                	jmp    33 <__memcpy+0x3>
Comment 5 Martin Sebor 2017-07-14 15:21:13 UTC
In r249278 GCC was changed to transform early on calls to bcmp, bcopy and bzero to memcmp, memcpy, and memset.  The main motivation was to simplify their handling and ensure they are optimized equivalently by the rest of the compiler.  (This was suggested here: https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00114.html).  It might make sense to take the same approach with mempcpy.

(The second part of r249278, i.e., the removal of the bcmp, bcopy and bzero handling, is yet to be done.)
Comment 6 Martin Liška 2017-07-17 13:43:10 UTC
Created attachment 41772 [details]
Patch candidate

I'm going to prepare some test-cases for that. Does it look good?
Comment 7 Wilco 2017-07-17 16:19:08 UTC
(In reply to Martin Liška from comment #6)
> Created attachment 41772 [details]
> Patch candidate
> 
> I'm going to prepare some test-cases for that. Does it look good?

Yes, it now inlines small constant sizes. However large and variable sized copies have the wrong return value:

void *f1(void *p, void *q) { return __builtin_mempcpy(p, q, 256); }

f1:
	mov	x2, 256
	b	memcpy
Comment 8 Martin Liška 2017-07-18 07:17:58 UTC
(In reply to Wilco from comment #7)
> (In reply to Martin Liška from comment #6)
> > Created attachment 41772 [details]
> > Patch candidate
> > 
> > I'm going to prepare some test-cases for that. Does it look good?
> 
> Yes, it now inlines small constant sizes. However large and variable sized
> copies have the wrong return value:
> 
> void *f1(void *p, void *q) { return __builtin_mempcpy(p, q, 256); }
> 
> f1:
> 	mov	x2, 256
> 	b	memcpy

Yep, I've noticed. It's strange for me why it's not working. I've just asked at GCC ML: https://gcc.gnu.org/ml/gcc/2017-07/msg00144.html
Comment 9 Wilco 2017-07-18 10:29:26 UTC
(In reply to Martin Liška from comment #8)
> (In reply to Wilco from comment #7)
> > (In reply to Martin Liška from comment #6)
> > > Created attachment 41772 [details]
> > > Patch candidate
> > > 
> > > I'm going to prepare some test-cases for that. Does it look good?
> > 
> > Yes, it now inlines small constant sizes. However large and variable sized
> > copies have the wrong return value:
> > 
> > void *f1(void *p, void *q) { return __builtin_mempcpy(p, q, 256); }
> > 
> > f1:
> > 	mov	x2, 256
> > 	b	memcpy
> 
> Yep, I've noticed. It's strange for me why it's not working. I've just asked
> at GCC ML: https://gcc.gnu.org/ml/gcc/2017-07/msg00144.html

It's marked as a tailcall so anything you generate afterwards will be ignored:

(call_insn/j 13 12 14 2 (parallel [
            (set (reg:DI 0 x0)
                (call (mem:DI (symbol_ref:DI ("memcpy") [flags 0x41] <function_decl 0xffffb7acc700 __builtin_memcpy>) [0 __builtin_memcpy S8 A8])
                    (const_int 0 [0])))
            (return)
        ]) "mempcpy.c":3 -1

Also check this case:

void f4(void *p, void *q, int i) { __builtin_mempcpy(p, q, i); }
Comment 10 Martin Liška 2017-07-18 11:29:13 UTC
> > 
> > Yep, I've noticed. It's strange for me why it's not working. I've just asked
> > at GCC ML: https://gcc.gnu.org/ml/gcc/2017-07/msg00144.html
> 
> It's marked as a tailcall so anything you generate afterwards will be
> ignored:

Thanks, knowing that it was easy to fix ;) I'm going to test the patch and will send it to mailing list.
Comment 11 Wilco 2017-07-21 16:22:44 UTC
(In reply to Martin Liška from comment #10)
> > > 
> > > Yep, I've noticed. It's strange for me why it's not working. I've just asked
> > > at GCC ML: https://gcc.gnu.org/ml/gcc/2017-07/msg00144.html
> > 
> > It's marked as a tailcall so anything you generate afterwards will be
> > ignored:
> 
> Thanks, knowing that it was easy to fix ;) I'm going to test the patch and
> will send it to mailing list.

Well the new patch looks great - I tried it on GLIBC and it ends up generating better code than inlining mempcpy in the string header. In total 111 calls to mempcpy are turned into memcpy, and although there are 166 extra instructions (most add/mov to deal with the return value), there are just 12 extra callee-saves.
Comment 12 Martin Liška 2017-08-01 11:59:59 UTC
Author: marxin
Date: Tue Aug  1 11:59:27 2017
New Revision: 250771

URL: https://gcc.gnu.org/viewcvs?rev=250771&root=gcc&view=rev
Log:
Make mempcpy more optimal (PR middle-end/70140).

2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* builtins.c (expand_builtin_memcpy_args): Remove.
	(expand_builtin_memcpy): Call newly added function
	expand_builtin_memory_copy_args.
	(expand_builtin_memcpy_with_bounds): Likewise.
	(expand_builtin_mempcpy): Remove last argument.
	(expand_builtin_mempcpy_with_bounds): Likewise.
	(expand_builtin_memory_copy_args): New function created from
	expand_builtin_mempcpy_args with small modifications.
	(expand_builtin_mempcpy_args): Remove.
	(expand_builtin_stpcpy): Remove unused argument.
	(expand_builtin): Likewise.
	(expand_builtin_with_bounds): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/string-opt-1.c
Comment 13 Martin Liška 2017-08-01 12:03:01 UTC
Fixed.
Comment 14 H.J. Lu 2017-08-01 15:41:55 UTC
On Fedora 26/x86-64, I got

FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O0
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O1
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O1
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -Og -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -Og -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -Os
FAIL: gcc.c-torture/execute/builtins/mempcpy-2.c execution,  -Os
FAIL: gcc.c-torture/execute/builtins/mempcpy.c compilation,  -O1  (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/mempcpy.c compilation,  -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/mempcpy.c compilation,  -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/mempcpy.c compilation,  -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/mempcpy.c compilation,  -Og -g  (internal compiler error)
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O1
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O1
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -Og -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -Og -g
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -Os
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -Os
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O1
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -Og -g
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -Os
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -Os
FAIL: gcc.c-torture/execute/builtins/strpcpy-2.c execution,  -Os
FAIL: gcc.dg/builtin-stringop-chk-4.c (internal compiler error)
FAIL: gcc.dg/builtin-stringop-chk-4.c (test for excess errors)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 196)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 211)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 213)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 215)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 216)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 217)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 222)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 237)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 239)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 241)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 242)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 243)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 248)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 263)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 302)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 305)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 319)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 320)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 321)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 322)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 323)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 324)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 326)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 328)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 346)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 347)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 353)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 354)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 361)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 362)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 363)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 369)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 370)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 395)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 401)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 403)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 405)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 421)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 425)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 427)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 428)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 429)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 430)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 436)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 437)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 471)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 472)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 473)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 475)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 479)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 480)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 493)
FAIL: gcc.dg/builtin-stringop-chk-4.c  (test for warnings, line 503)
FAIL: gcc.dg/string-opt-1.c scan-assembler memcpy
FAIL: gcc.dg/strlenopt-14g.c execution test
FAIL: gcc.dg/strlenopt-14gf.c execution test
FAIL: gcc.dg/tree-prof/val-prof-7.c compilation,  -fprofile-generate -D_PROFILE_GENERATE (internal compiler error)
FAIL: gcc.dg/tree-prof/val-prof-7.c compilation,  -fprofile-generate -D_PROFILE_GENERATE (internal compiler error)
FAIL: gcc.target/i386/chkp-stropt-16.c (internal compiler error)
FAIL: gcc.target/i386/chkp-stropt-16.c (test for excess errors)
Comment 15 Martin Liška 2017-08-01 16:01:40 UTC
Sorry for the breakage, I'm going to take a look.
Comment 16 Martin Liška 2017-08-01 17:11:15 UTC
So I accidentally installed an old version of patch, reverted in r250788.
Comment 17 Martin Liška 2017-08-01 17:22:01 UTC
Author: marxin
Date: Tue Aug  1 17:21:29 2017
New Revision: 250789

URL: https://gcc.gnu.org/viewcvs?rev=250789&root=gcc&view=rev
Log:
Make mempcpy more optimal (PR middle-end/70140).

2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* builtins.c (expand_builtin_memcpy_args): Remove.
	(expand_builtin_memcpy): Call newly added function
	expand_builtin_memory_copy_args.
	(expand_builtin_memcpy_with_bounds): Likewise.
	(expand_builtin_mempcpy): Remove last argument.
	(expand_builtin_mempcpy_with_bounds): Likewise.
	(expand_builtin_memory_copy_args): New function created from
	expand_builtin_mempcpy_args with small modifications.
	(expand_builtin_mempcpy_args): Remove.
	(expand_builtin_stpcpy): Remove unused argument.
	(expand_builtin): Likewise.
	(expand_builtin_with_bounds): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/string-opt-1.c
Comment 18 Martin Liška 2017-08-01 17:23:19 UTC
Now it should be fine.
Comment 19 Aldy Hernandez 2017-09-13 16:13:36 UTC
Author: aldyh
Date: Wed Sep 13 16:13:04 2017
New Revision: 252219

URL: https://gcc.gnu.org/viewcvs?rev=252219&root=gcc&view=rev
Log:
Make mempcpy more optimal (PR middle-end/70140).

2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* builtins.c (expand_builtin_memcpy_args): Remove.
	(expand_builtin_memcpy): Call newly added function
	expand_builtin_memory_copy_args.
	(expand_builtin_memcpy_with_bounds): Likewise.
	(expand_builtin_mempcpy): Remove last argument.
	(expand_builtin_mempcpy_with_bounds): Likewise.
	(expand_builtin_memory_copy_args): New function created from
	expand_builtin_mempcpy_args with small modifications.
	(expand_builtin_mempcpy_args): Remove.
	(expand_builtin_stpcpy): Remove unused argument.
	(expand_builtin): Likewise.
	(expand_builtin_with_bounds): Likewise.

Modified:
    branches/range-gen2/gcc/ChangeLog
    branches/range-gen2/gcc/builtins.c
    branches/range-gen2/gcc/testsuite/ChangeLog
    branches/range-gen2/gcc/testsuite/gcc.dg/string-opt-1.c
Comment 20 Aldy Hernandez 2017-09-13 16:16:53 UTC
Author: aldyh
Date: Wed Sep 13 16:16:21 2017
New Revision: 252233

URL: https://gcc.gnu.org/viewcvs?rev=252233&root=gcc&view=rev
Log:
Make mempcpy more optimal (PR middle-end/70140).

2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
2017-08-01  Martin Liska  <mliska@suse.cz>

	PR middle-end/70140
	* builtins.c (expand_builtin_memcpy_args): Remove.
	(expand_builtin_memcpy): Call newly added function
	expand_builtin_memory_copy_args.
	(expand_builtin_memcpy_with_bounds): Likewise.
	(expand_builtin_mempcpy): Remove last argument.
	(expand_builtin_mempcpy_with_bounds): Likewise.
	(expand_builtin_memory_copy_args): New function created from
	expand_builtin_mempcpy_args with small modifications.
	(expand_builtin_mempcpy_args): Remove.
	(expand_builtin_stpcpy): Remove unused argument.
	(expand_builtin): Likewise.
	(expand_builtin_with_bounds): Likewise.

Modified:
    branches/range-gen2/gcc/ChangeLog
    branches/range-gen2/gcc/builtins.c
    branches/range-gen2/gcc/testsuite/ChangeLog
    branches/range-gen2/gcc/testsuite/gcc.dg/string-opt-1.c