This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts
- From: Richard Biener <rguenther at suse dot de>
- To: Jeff Law <law at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 27 Jun 2014 13:49:38 +0200 (CEST)
- Subject: Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LSU dot 2 dot 11 dot 1406121206150 dot 32404 at zhemvz dot fhfr dot qr> <539A0E6B dot 3080905 at redhat dot com>
On Thu, 12 Jun 2014, Jeff Law wrote:
> On 06/12/14 04:12, Richard Biener wrote:
> >
> > This implements the requested inlining of memmove for possibly
> > overlapping arguments by doing first all loads and then all stores.
> > The easiest place is to do this in memory op folding where we already
> > perform inlining of some memcpy cases (but fail to do the equivalent
> > memcpy optimization - though RTL expansion later does it).
> >
> > The following patch restricts us to max. word-mode size. Ideally
> > we'd have a way to check for the number of real instructions needed
> > to load an (aligned) value of size N. But maybe we don't care
> > and are fine with doing multiple loads / stores?
> >
> > Anyway, the following is conservative (but maybe not enough).
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > These transforms don't really belong to GENERIC folding (they
> > also run at -O0 ...), similar to most builtin foldings. But this
> > patch is not to change that.
> >
> > Any comments on the size/cost issue?
> I recall seeing something in one of the BZ databases that asked for
> double-word to be expanded inline. Presumably the reporter's code did
> lots of double-word things of this nature.
>
> Obviously someone else might want quad-word and so-on. However, double
> words seem like a very reasonable request.
Hmm, yeah. Meanwhile I found the MOVE_MAX target macro which gives
me what I was looking for (max memory-reg move size with a single
instruction). Eventually increasing the maximum number of
loads/stores to two also allows handling of (some) non-power-of-two sizes
and some more cases of unaligned accesses on SLOW_UNALIGNED_ACCESS
targets. But then we're re-implementing move_by_pieces on GIMPLE ... ;)
(maybe not totally unreasonable?).
I'm going to go for a single load/store and MOVE_MAX for now - I
have quite some fallout to deal with anyway (analyzed strlenopt-1.c
FAIL only, the other strlenopt cases are probably similar)
FAIL: gcc.dg/strlenopt-1.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-1.c scan-tree-dump-times strlen "memcpy \\\\(" 4
we generate (note unfolded read from "/"!)
_15 = MEM[(char * {ref-all})"/"];
MEM[(char * {ref-all})_14] = _15;
where strlenopt doesn't catch (short *)_14 = (short)"/\0" (maybe too
much asked, given endianess and so ...).
FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "memcpy \\\\(" 8
FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "strchr \\\\(" 0
FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "memcpy
\\\\([^\\n\\r]*,
1\\\\)" 1
FAIL: gcc.dg/strlenopt-18g.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-19.c scan-tree-dump-times strlen "memcpy \\\\(" 6
FAIL: gcc.dg/strlenopt-1f.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-1f.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-2.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-2.c scan-tree-dump-times strlen "memcpy \\\\(" 5
FAIL: gcc.dg/strlenopt-20.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-21.c scan-tree-dump-times strlen "memcpy \\\\(" 3
FAIL: gcc.dg/strlenopt-6.c scan-tree-dump-times strlen "memcpy \\\\(" 7
FAIL: gcc.dg/strlenopt-6.c scan-tree-dump-times strlen "strchr \\\\(" 0
FAIL: gcc.dg/strlenopt-7.c scan-tree-dump-times strlen "strlen \\\\(" 0
FAIL: gcc.dg/strlenopt-7.c scan-tree-dump-times strlen "memcpy \\\\(" 2
FAIL: gcc.dg/strlenopt-7.c scan-tree-dump-times optimized "return 3;" 1
FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "strlen \\\\(" 0
FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-9.c scan-tree-dump-times strlen "memcpy \\\\(" 6
FAIL: gfortran.dg/coarray_lib_realloc_1.f90 -O scan-tree-dump-times
original
"__builtin_memcpy" 2
Both memcpy calls are now plain assignments. The testcase needs
changing to test what it really wanted to test ...
FAIL: gfortran.dg/pr45636.f90 -O scan-tree-dump-times forwprop2
"memset" 0
This shows that funny DSE patterns in forwprop no longer work
when the store is not a memcpy but a plain assignment. Of course
DSE also doesn't remove dead memset()s. The testcase also shows
that we could/should handle memset() in the same way if it only
requires a single store from a constant.
Thanks,
Richard.