This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Use of vector instructions in memmov/memset expanding
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Andi Kleen <andi at firstfloor dot org>, Michael Zolotukhin <michael dot v dot zolotukhin at gmail dot com>, gcc-patches at gcc dot gnu dot org, Jan Hubicka <hubicka at ucw dot cz>, Richard Guenther <richard dot guenther at gmail dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, izamyatin at gmail dot com, areg dot melikadamyan at gmail dot com
- Date: Wed, 28 Sep 2011 14:36:05 +0200
- Subject: Re: Use of vector instructions in memmov/memset expanding
- References: <CANtU07-DAOMe9Nk4oYj3FJnkZqgkHvSnobsugeSfcRUzDChrrg@mail.gmail.com> <CANtU07_ZoRrLjWBGv=r6MCeBVTh-z13Cab0frjQdg2e7VAyzGg@mail.gmail.com> <20110715232425.GA24793@atrey.karlin.mff.cuni.cz> <CANtU07-HyO0gAZPz-XLknngAviRSyQjyk3DBfb-PjfPyt0KO_g@mail.gmail.com> <CANtU07-eCpAZ=VgvkdBCORq8bR0UZCgryofBXU_4FcRDJ7hWoQ@mail.gmail.com> <CANtU079YjpFwdy1kXoebsLyKe08KTwjwXDGJ4kY+gEJcZUTdMg@mail.gmail.com> <CANtU078YRvnQm8PMQ1GSUxczfTzwup1NLFJXmTDJfGPqVx_Nqg@mail.gmail.com> <m2hb3wj3lw.fsf@firstfloor.org> <20110928115546.GL2687@tyan-ft48-01.lab.bos.redhat.com>
> On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
> > Michael Zolotukhin <michael.v.zolotukhin@gmail.com> writes:
> > >
> > > Build and 'make check' was tested.
> >
> > Could you expand a bit on the performance benefits? Where does it help?
>
> Especially when glibc these days has very well optimized implementations
> tuned for various CPUs and it is very unlikely beneficial to inline
> memcpy/memset if they aren't really short or have unknown number of
> iterations.
I guess we should update the expansion tables so we produce function calls more often.
I will look how things behave on my setup. Do you know glibc version numbers when
the optimized string functions was introduced?
Concerning inline SSE, I think it makes a lot of sense when we know size &
alignment so we can output just few SSE moves instead of more integer moves.
We definitely need some numbers for the loop variants.
Honza