This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, x86] Use vector moves in memmove expanding

From: Michael Zolotukhin <michael dot v dot zolotukhin at gmail dot com>
To: Ondřej Bílka <neleai at seznam dot cz>
Cc: Jan Hubicka <hubicka at ucw dot cz>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Wed, 10 Apr 2013 21:53:09 +0400
Subject: Re: [PATCH, x86] Use vector moves in memmove expanding
References: <CANtU07_xUQHqFVhc=xXcXC1T0c37FhW+F9O8BgHtnoq2LNsEYw at mail dot gmail dot com> <20130410174302 dot GA9599 at domone dot kolej dot mff dot cuni dot cz>

> Hi, I am writing memcpy for libc. It avoids computed jump and has is
> much faster on small strings (variant for sandy bridge attached.

I'm not sure I get what you meant - could you please explain what is
computed jumps?

> You must also check performance with cold instruction cache.
> Now memcpy(x,y,128) takes 126 bytes which is too much.

> Do not align for small sizes. Dependency caused by this erases any gains
> that you migth get. Keep in mind that in 55% of cases data are already
> aligned.

Other algorithms are still available and we can use them for small
sizes. E.g. for sizes <128 we could emit loop with GPR-moves and don't
use vector instructions in it.
But that's tuning and I haven't worked on it yet - I'm going to
measure performance of all algorithms on all sizes and thus defines on
which sizes which algorithm is preferable.
What I did in this patch is introducing some infrastructure to allow
emitting of vector moves in movmem expanding - tuning is certainly
possible and needed, but that's out of the scope of the patch.

On 10 April 2013 21:43, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote:
>> Hi,
>> This patch adds a new algorithm of expanding movmem in x86 and a bit
>> refactor existing implementation. This is a reincarnation of the patch
>> that was sent wasn't checked couple of years ago - now I reworked it
>> from scratch and divide into several more manageable parts.
>>
> Hi, I am writing memcpy for libc. It avoids computed jump and has is
> much faster on small strings (variant for sandy bridge attached.
>
>> For now this algorithm isn't used, because cost_models are tuned to
>> use existing ones. I believe the new algorithm will give better
>> performance, but I'll leave cost-models tuning for a separate patch.
>>
> You must also check performance with cold instruction cache.
> Now memcpy(x,y,128) takes 126 bytes which is too much.
>
>> Also, I changed get_mem_align_offset to make it handle MEM_REFs as
>> well. Probably, there is another way of getting info about alignment -
>> if so, please let me know.
>>
> Do not align for small sizes. Dependency caused by this erases any gains
> that you migth get. Keep in mind that in 55% of cases data are already
> aligned.
>
> Also in my tests best way to handle prologue is first copy last 16
> bytes and then loop.
>
>> Similar improvements could be done in expanding of memset, but that's
>> in progress now and I'm going to proceed with it if this patch is ok.
>>
>> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
>>
>> Is it ok for trunk?
>>
>> Changelog entry:
>>
>> 2013-04-10  Michael Zolotukhin  <michael.v.zolotukhin@gmail.com>
>>
>>         * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
>>         * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
>>         adjust_address instead of change_address to keep info about alignment.
>>         (emit_strmov): Remove.
>>         (emit_memmov): New function.
>>         (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
>>         (expand_movmem_epilogue): Likewise and return updated rtx for
>>         destination.
>>         (expand_constant_movmem_prologue): Likewise and return updated rtx for
>>         destination and source.
>>         (decide_alignment): Refactor, handle vector_loop.
>>         (ix86_expand_movmem): Likewise.
>>         (ix86_expand_setmem): Likewise.
>>         * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
>>         * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.
>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.
>



--
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Follow-Ups:
- Re: [PATCH, x86] Use vector moves in memmove expanding
  - From: OndÅej BÃlka

References:
- [PATCH, x86] Use vector moves in memmove expanding
  - From: Michael Zolotukhin
- Re: [PATCH, x86] Use vector moves in memmove expanding
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]