[PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

H.J. Lu hjl.tools@gmail.com
Wed Mar 31 17:54:15 GMT 2021


On Wed, Mar 31, 2021 at 10:43 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > > Reading through the optimization manual it seems that mosvb is fast for
> > > small block no matter if the size is hard wired. In that case you
> > > probably want to check whetehr max_size or expected_size is known to be
> > > small rather than max_size == min_size and both being small.
> > >
> > > But it depends on what CPU really does.
> > > Honza
> >
> > For small data size, rep movsb is faster only under certain conditions.   We
> > can continue fine tuning rep movsb.
>
> OK, I however wonder why you need condtion maxsize=minsize.
>  - If CPU is looking for movl $cst, %rcx than we probably want to be
>    sure that it is not moved away fro rep ;movsb by adding fused pattern
>  - If rep movsb is slower than loop for very small blocks then you want
>    to set lower bound on minsize & expected size, but you do not need
>    to require maxsize=minsize
>  - If rep movsb is slower than sequence of moves for small blocks then
>    one needs to tweak move by pieces
>  - If rep movsb is slower for larger blocks than you want to test
>    maxsize and expected size
> So in neither of those scenarios testing maxsize=minsize alone makes too
> much sense to me... What was the original motivation for differentiating
> between precisely known size?
>
> I am mostly curious because it is not that uncomon to have small maxsize
> because we are able to track the object size and using short sequence
> for those would be nice.
>
> Having minsize non-trivial may not be that uncommon these days either
> given that we track value ranges (and under assumption that
> memcpy/memset expanders was updated to take these into account).
>

Hongyu has done some analysis on this.  Hongyu, can you share what
you got?

Thanks.

-- 
H.J.


More information about the Gcc-patches mailing list