[PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
H.J. Lu
hjl.tools@gmail.com
Wed Mar 31 17:54:15 GMT 2021
On Wed, Mar 31, 2021 at 10:43 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > > Reading through the optimization manual it seems that mosvb is fast for
> > > small block no matter if the size is hard wired. In that case you
> > > probably want to check whetehr max_size or expected_size is known to be
> > > small rather than max_size == min_size and both being small.
> > >
> > > But it depends on what CPU really does.
> > > Honza
> >
> > For small data size, rep movsb is faster only under certain conditions. We
> > can continue fine tuning rep movsb.
>
> OK, I however wonder why you need condtion maxsize=minsize.
> - If CPU is looking for movl $cst, %rcx than we probably want to be
> sure that it is not moved away fro rep ;movsb by adding fused pattern
> - If rep movsb is slower than loop for very small blocks then you want
> to set lower bound on minsize & expected size, but you do not need
> to require maxsize=minsize
> - If rep movsb is slower than sequence of moves for small blocks then
> one needs to tweak move by pieces
> - If rep movsb is slower for larger blocks than you want to test
> maxsize and expected size
> So in neither of those scenarios testing maxsize=minsize alone makes too
> much sense to me... What was the original motivation for differentiating
> between precisely known size?
>
> I am mostly curious because it is not that uncomon to have small maxsize
> because we are able to track the object size and using short sequence
> for those would be nice.
>
> Having minsize non-trivial may not be that uncommon these days either
> given that we track value ranges (and under assumption that
> memcpy/memset expanders was updated to take these into account).
>
Hongyu has done some analysis on this. Hongyu, can you share what
you got?
Thanks.
--
H.J.
More information about the Gcc-patches
mailing list