Use of vector instructions in memmov/memset expanding

Wed Sep 28 12:36:00 GMT 2011

Attached is a part 2 of patch that enables use of vector-instructions
in memset and memcopy (back-end part).

The main part of the changes is in functions
ix86_expand_setmem/ix86_expand_movmem. The other changes are only
needed to support it.
The changes mostly touched unrolled_loop strategy – now vector move
modes could be used here. That resulted in large epilogues and
prologues, so their generation also was modified.
This patch contains some changes in middle-end (to make build
possible) - but all these changes are present in the first part of
patch, so there is no need to review them here.

Build and 'make check' was tested.

On 28 September 2011 14:56, Michael Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> Attached is a part 1 of patch that enables use of vector-instructions
> in memset and memcopy (middle-end part).
> The main part of the changes is in functions
> move_by_pieces/set_by_pieces. In new version algorithm of move-mode
> selection was changed – now it checks if alignment is known at compile
> time and uses cost-models to choose between aligned and unaligned
> vector or not-vector move-modes.
>
> Build and 'make check' was tested - in 'make check' there is a fail,
> that would be cured when complete patch is applied.
>
> On 27 September 2011 18:44, Michael Zolotukhin
> <michael.v.zolotukhin@gmail.com> wrote:
>> I divided the patch into three smaller ones:
>>
>> 1) Patch with target-independent changes (see attached file memfunc-mid.patch).
>> The main part of the changes is in functions
>> move_by_pieces/set_by_pieces. In new version algorithm of move-mode
>> selection was changed – now it checks if alignment is known at compile
>> time and uses cost-models to choose between aligned and unaligned
>> vector or not-vector move-modes.
>>
>> 2) Patch with target-dependent changes (memfunc-be.patch).
>> The main part of the changes is in functions
>> ix86_expand_setmem/ix86_expand_movmem. The other changes are only
>> needed to support it.
>> The changes mostly touched unrolled_loop strategy – now vector move
>> modes could be used here. That resulted in large epilogues and
>> prologues, so their generation also was modified.
>> This patch contains some changes in middle-end (to make build
>> possible) - but all these changes are present in the first patch, so
>> there is no need to review them here.
>>
>> 3) Patch with all new tests (memfunc-tests.patch).
>> This patch contains a lot of small tests for different memset and memcopy cases.
>>
>> Separately from each other, these patches won't give performance gain.
>> The positive effect will be noticeable only if they are applied
>> together (I attach the complete patch also - see file
>> memfunc-complete.patch).
>>
>>
>> If you have any questions regarding these changes, please don't
>> hesitate to ask them.
>>
>>
>> On 18 July 2011 15:00, Michael Zolotukhin
>> <michael.v.zolotukhin@gmail.com> wrote:
>>> Here is a summary - probably, it doesn't cover every single piece in
>>> the patch, but I tried to describe the major changes. I hope this will
>>> help you a bit - and of course I'll answer your further questions if
>>> they appear.
>>>
>>> The changes could be logically divided into two parts (though, these
>>> parts have something in common).
>>> The first part is changes in target-independent part, in functions
>>> move_by_pieces() and store_by_pieces() - mostly located in expr.c.
>>> The second part touches ix86_expand_movmem() and ix86_expand_setmem()
>>> - mostly located in config/i386/i386.c.
>>>
>>> Changes in i386.c (target-dependent part):
>>> 1) Strategies for cases with known and unknown alignment are separated
>>> from each other.
>>> When alignment is known at compile time, we could generate optimized
>>> code without libcalls.
>>> When it's unknown, we sometimes could create runtime-checks to reach
>>> desired alignment, but not always.
>>> Strategies for atom and generic_32, generic_64 were chosen according
>>> to set of experiments, strategies in other
>>> cost models are unchanged (strategies for unknown alignment are copied
>>> from existing strategies).
>>> 2) unrolled_loop algorithm was modified - now it uses SSE move-modes,
>>> if they're available.
>>> 3) As size of data, moved in one iteration, greatly increased, and
>>> epilogues became bigger - so some changes were needed in epilogue
>>> generation. In some cases a special loop (not unrolled) is generated
>>> in epilogue to avoid slow copying by bytes (changes in
>>> expand_set_or_movmem_via_loop() and introducing of
>>> expand_set_or_movmem_via_loop_with_iter() is made for these cases).
>>> 4) As bigger alignment might be needed than previously, prologue
>>> generation was also modified.
>>>
>>> Changes in expr.c (target-independent part):
>>> There are two possible strategies now: use of aligned and unaligned
>>> moves. For each of them a cost model was implemented and the choice is
>>> made according to the cost of each option. Move-mode choice is made by
>>> functions widest_mode_for_unaligned_mov() and
>>> widest_mode_for_aligned_mov().
>>> Cost estimation is implemented in functions compute_aligned_cost() and
>>> compute_unaligned_cost().
>>> Choice between these two strategies and the generation of moves
>>> themselves are in function move_by_pieces().
>>>
>>> Function store_by_pieces() calls set_by_pieces_1() instead of
>>> store_by_pieces_1(), if this is memset-case (I needed to introduce
>>> set_by_pieces_1 to separate memset-case from others -
>>> store_by_pieces_1 is sometimes called for strcpy and some other
>>> functions, not only for memset).
>>>
>>> Set_by_pieces_1() estimates costs of aligned and unaligned strategies
>>> (as in move_by_pieces() ) and generates moves for memset. Single move
>>> is generated via
>>> generate_move_with_mode(). If it's called first time, a promoted value
>>> (register, filled with one-byte value of memset argument) is generated
>>> - later calls reuse this value.
>>>
>>> Changes in MD-files:
>>> For generation of promoted values, I made some changes in
>>> promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands
>>> for vec_dup4si and vec_dupv2di were introduced for this too (these
>>> expands differ from corresponding define_insns - existing define_insn
>>> work only with registers, while new expands could process memory
>>> operand as well).
>>>
>>> Some code were added to allow generation of MOVQ (with SSE-registers)
>>> - such moves aren't usual ones, because they use only half of
>>> xmm-register.
>>> There was a need to generate such moves explicitly, so I added a
>>> simple expand to sse.md.
>>>
>>>
>>> On 16 July 2011 03:24, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> > New algorithm for move-mode selection is implemented for move_by_pieces,
>>>>> > store_by_pieces.
>>>>> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
>>>>> > similar way, x86 cost-models parameters are slightly changed to support
>>>>> > this. This implementation checks if array's alignment is known at compile
>>>>> > time and chooses expanding algorithm and move-mode according to it.
>>>>
>>>> Can you give some sumary of changes you made?  It would make it a lot easier to
>>>> review if it was broken up int the generic changes (with rationaly why they are
>>>> needed) and i386 backend changes that I could review then.
>>>>
>>>> From first pass through the patch I don't quite see the need for i.e. adding
>>>> new move patterns when we can output all kinds of SSE moves already.  Will look
>>>> more into the patch to see if I can come up with useful comments.
>>>>
>>>> Honza
>>>>
>>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.
>>
>
>
>
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.
>

-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memfunc-be.patch
Type: application/octet-stream
Size: 74731 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20110928/8ba71270/attachment.obj>