This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH, x86] Use vector moves in memmove expanding
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Michael Zolotukhin <michael dot v dot zolotukhin at gmail dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 10 Apr 2013 19:43:02 +0200
- Subject: Re: [PATCH, x86] Use vector moves in memmove expanding
- References: <CANtU07_xUQHqFVhc=xXcXC1T0c37FhW+F9O8BgHtnoq2LNsEYw at mail dot gmail dot com>
On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote:
> This patch adds a new algorithm of expanding movmem in x86 and a bit
> refactor existing implementation. This is a reincarnation of the patch
> that was sent wasn't checked couple of years ago - now I reworked it
> from scratch and divide into several more manageable parts.
Hi, I am writing memcpy for libc. It avoids computed jump and has is
much faster on small strings (variant for sandy bridge attached.
> For now this algorithm isn't used, because cost_models are tuned to
> use existing ones. I believe the new algorithm will give better
> performance, but I'll leave cost-models tuning for a separate patch.
You must also check performance with cold instruction cache.
Now memcpy(x,y,128) takes 126 bytes which is too much.
> Also, I changed get_mem_align_offset to make it handle MEM_REFs as
> well. Probably, there is another way of getting info about alignment -
> if so, please let me know.
Do not align for small sizes. Dependency caused by this erases any gains
that you migth get. Keep in mind that in 55% of cases data are already
Also in my tests best way to handle prologue is first copy last 16
bytes and then loop.
> Similar improvements could be done in expanding of memset, but that's
> in progress now and I'm going to proceed with it if this patch is ok.
> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
> Is it ok for trunk?
> Changelog entry:
> 2013-04-10 Michael Zolotukhin <email@example.com>
> * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
> * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
> adjust_address instead of change_address to keep info about alignment.
> (emit_strmov): Remove.
> (emit_memmov): New function.
> (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
> (expand_movmem_epilogue): Likewise and return updated rtx for
> (expand_constant_movmem_prologue): Likewise and return updated rtx for
> destination and source.
> (decide_alignment): Refactor, handle vector_loop.
> (ix86_expand_movmem): Likewise.
> (ix86_expand_setmem): Likewise.
> * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
> * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.