This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts


On Thu, 12 Jun 2014, Jeff Law wrote:

> On 06/12/14 04:12, Richard Biener wrote:
> > 
> > This implements the requested inlining of memmove for possibly
> > overlapping arguments by doing first all loads and then all stores.
> > The easiest place is to do this in memory op folding where we already
> > perform inlining of some memcpy cases (but fail to do the equivalent
> > memcpy optimization - though RTL expansion later does it).
> > 
> > The following patch restricts us to max. word-mode size.  Ideally
> > we'd have a way to check for the number of real instructions needed
> > to load an (aligned) value of size N.  But maybe we don't care
> > and are fine with doing multiple loads / stores?
> > 
> > Anyway, the following is conservative (but maybe not enough).
> > 
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > 
> > These transforms don't really belong to GENERIC folding (they
> > also run at -O0 ...), similar to most builtin foldings.  But this
> > patch is not to change that.
> > 
> > Any comments on the size/cost issue?
> I recall seeing something in one of the BZ databases that asked for 
> double-word to be expanded inline.  Presumably the reporter's code did 
> lots of double-word things of this nature.
> 
> Obviously someone else might want quad-word and so-on.  However, double 
> words seem like a very reasonable request.

Hmm, yeah.  Meanwhile I found the MOVE_MAX target macro which gives
me what I was looking for (max memory-reg move size with a single
instruction).  Eventually increasing the maximum number of
loads/stores to two also allows handling of (some) non-power-of-two sizes
and some more cases of unaligned accesses on SLOW_UNALIGNED_ACCESS
targets.  But then we're re-implementing move_by_pieces on GIMPLE ... ;)
(maybe not totally unreasonable?).

I'm going to go for a single load/store and MOVE_MAX for now - I
have quite some fallout to deal with anyway (analyzed strlenopt-1.c
FAIL only, the other strlenopt cases are probably similar)

FAIL: gcc.dg/strlenopt-1.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-1.c scan-tree-dump-times strlen "memcpy \\\\(" 4

we generate (note unfolded read from "/"!)

  _15 = MEM[(char * {ref-all})"/"];
  MEM[(char * {ref-all})_14] = _15;

where strlenopt doesn't catch (short *)_14 = (short)"/\0" (maybe too
much asked, given endianess and so ...).

FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "memcpy \\\\(" 8
FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "strchr \\\\(" 0
FAIL: gcc.dg/strlenopt-10.c scan-tree-dump-times strlen "memcpy 
\\\\([^\\n\\r]*,
 1\\\\)" 1
FAIL: gcc.dg/strlenopt-18g.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-19.c scan-tree-dump-times strlen "memcpy \\\\(" 6
FAIL: gcc.dg/strlenopt-1f.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-1f.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-2.c scan-tree-dump-times strlen "strlen \\\\(" 2
FAIL: gcc.dg/strlenopt-2.c scan-tree-dump-times strlen "memcpy \\\\(" 5
FAIL: gcc.dg/strlenopt-20.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-21.c scan-tree-dump-times strlen "memcpy \\\\(" 3
FAIL: gcc.dg/strlenopt-6.c scan-tree-dump-times strlen "memcpy \\\\(" 7
FAIL: gcc.dg/strlenopt-6.c scan-tree-dump-times strlen "strchr \\\\(" 0
FAIL: gcc.dg/strlenopt-7.c scan-tree-dump-times strlen "strlen \\\\(" 0
FAIL: gcc.dg/strlenopt-7.c scan-tree-dump-times strlen "memcpy \\\\(" 2
FAIL: gcc.dg/strlenopt-7.c scan-tree-dump-times optimized "return 3;" 1
FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "strlen \\\\(" 0
FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "memcpy \\\\(" 4
FAIL: gcc.dg/strlenopt-9.c scan-tree-dump-times strlen "memcpy \\\\(" 6

FAIL: gfortran.dg/coarray_lib_realloc_1.f90  -O   scan-tree-dump-times 
original
"__builtin_memcpy" 2

Both memcpy calls are now plain assignments.  The testcase needs
changing to test what it really wanted to test ...

FAIL: gfortran.dg/pr45636.f90  -O   scan-tree-dump-times forwprop2 
"memset" 0

This shows that funny DSE patterns in forwprop no longer work
when the store is not a memcpy but a plain assignment.  Of course
DSE also doesn't remove dead memset()s.  The testcase also shows
that we could/should handle memset() in the same way if it only
requires a single store from a constant.

Thanks,
Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]