This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Use of vector instructions in memmov/memset expanding


Attached is a part 3 of patch that enables use of vector-instructions
in memset and memcopy. This part contains only tests for different
memset/memcopy cases.

On 28 September 2011 14:57, Michael Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> Attached is a part 2 of patch that enables use of vector-instructions
> in memset and memcopy (back-end part).
>
> The main part of the changes is in functions
> ix86_expand_setmem/ix86_expand_movmem. The other changes are only
> needed to support it.
> The changes mostly touched unrolled_loop strategy – now vector move
> modes could be used here. That resulted in large epilogues and
> prologues, so their generation also was modified.
> This patch contains some changes in middle-end (to make build
> possible) - but all these changes are present in the first part of
> patch, so there is no need to review them here.
>
> Build and 'make check' was tested.
>
>
> On 28 September 2011 14:56, Michael Zolotukhin
> <michael.v.zolotukhin@gmail.com> wrote:
>> Attached is a part 1 of patch that enables use of vector-instructions
>> in memset and memcopy (middle-end part).
>> The main part of the changes is in functions
>> move_by_pieces/set_by_pieces. In new version algorithm of move-mode
>> selection was changed – now it checks if alignment is known at compile
>> time and uses cost-models to choose between aligned and unaligned
>> vector or not-vector move-modes.
>>
>> Build and 'make check' was tested - in 'make check' there is a fail,
>> that would be cured when complete patch is applied.
>>
>> On 27 September 2011 18:44, Michael Zolotukhin
>> <michael.v.zolotukhin@gmail.com> wrote:
>>> I divided the patch into three smaller ones:
>>>
>>> 1) Patch with target-independent changes (see attached file memfunc-mid.patch).
>>> The main part of the changes is in functions
>>> move_by_pieces/set_by_pieces. In new version algorithm of move-mode
>>> selection was changed – now it checks if alignment is known at compile
>>> time and uses cost-models to choose between aligned and unaligned
>>> vector or not-vector move-modes.
>>>
>>> 2) Patch with target-dependent changes (memfunc-be.patch).
>>> The main part of the changes is in functions
>>> ix86_expand_setmem/ix86_expand_movmem. The other changes are only
>>> needed to support it.
>>> The changes mostly touched unrolled_loop strategy – now vector move
>>> modes could be used here. That resulted in large epilogues and
>>> prologues, so their generation also was modified.
>>> This patch contains some changes in middle-end (to make build
>>> possible) - but all these changes are present in the first patch, so
>>> there is no need to review them here.
>>>
>>> 3) Patch with all new tests (memfunc-tests.patch).
>>> This patch contains a lot of small tests for different memset and memcopy cases.
>>>
>>> Separately from each other, these patches won't give performance gain.
>>> The positive effect will be noticeable only if they are applied
>>> together (I attach the complete patch also - see file
>>> memfunc-complete.patch).
>>>
>>>
>>> If you have any questions regarding these changes, please don't
>>> hesitate to ask them.
>>>
>>>
>>> On 18 July 2011 15:00, Michael Zolotukhin
>>> <michael.v.zolotukhin@gmail.com> wrote:
>>>> Here is a summary - probably, it doesn't cover every single piece in
>>>> the patch, but I tried to describe the major changes. I hope this will
>>>> help you a bit - and of course I'll answer your further questions if
>>>> they appear.
>>>>
>>>> The changes could be logically divided into two parts (though, these
>>>> parts have something in common).
>>>> The first part is changes in target-independent part, in functions
>>>> move_by_pieces() and store_by_pieces() - mostly located in expr.c.
>>>> The second part touches ix86_expand_movmem() and ix86_expand_setmem()
>>>> - mostly located in config/i386/i386.c.
>>>>
>>>> Changes in i386.c (target-dependent part):
>>>> 1) Strategies for cases with known and unknown alignment are separated
>>>> from each other.
>>>> When alignment is known at compile time, we could generate optimized
>>>> code without libcalls.
>>>> When it's unknown, we sometimes could create runtime-checks to reach
>>>> desired alignment, but not always.
>>>> Strategies for atom and generic_32, generic_64 were chosen according
>>>> to set of experiments, strategies in other
>>>> cost models are unchanged (strategies for unknown alignment are copied
>>>> from existing strategies).
>>>> 2) unrolled_loop algorithm was modified - now it uses SSE move-modes,
>>>> if they're available.
>>>> 3) As size of data, moved in one iteration, greatly increased, and
>>>> epilogues became bigger - so some changes were needed in epilogue
>>>> generation. In some cases a special loop (not unrolled) is generated
>>>> in epilogue to avoid slow copying by bytes (changes in
>>>> expand_set_or_movmem_via_loop() and introducing of
>>>> expand_set_or_movmem_via_loop_with_iter() is made for these cases).
>>>> 4) As bigger alignment might be needed than previously, prologue
>>>> generation was also modified.
>>>>
>>>> Changes in expr.c (target-independent part):
>>>> There are two possible strategies now: use of aligned and unaligned
>>>> moves. For each of them a cost model was implemented and the choice is
>>>> made according to the cost of each option. Move-mode choice is made by
>>>> functions widest_mode_for_unaligned_mov() and
>>>> widest_mode_for_aligned_mov().
>>>> Cost estimation is implemented in functions compute_aligned_cost() and
>>>> compute_unaligned_cost().
>>>> Choice between these two strategies and the generation of moves
>>>> themselves are in function move_by_pieces().
>>>>
>>>> Function store_by_pieces() calls set_by_pieces_1() instead of
>>>> store_by_pieces_1(), if this is memset-case (I needed to introduce
>>>> set_by_pieces_1 to separate memset-case from others -
>>>> store_by_pieces_1 is sometimes called for strcpy and some other
>>>> functions, not only for memset).
>>>>
>>>> Set_by_pieces_1() estimates costs of aligned and unaligned strategies
>>>> (as in move_by_pieces() ) and generates moves for memset. Single move
>>>> is generated via
>>>> generate_move_with_mode(). If it's called first time, a promoted value
>>>> (register, filled with one-byte value of memset argument) is generated
>>>> - later calls reuse this value.
>>>>
>>>> Changes in MD-files:
>>>> For generation of promoted values, I made some changes in
>>>> promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands
>>>> for vec_dup4si and vec_dupv2di were introduced for this too (these
>>>> expands differ from corresponding define_insns - existing define_insn
>>>> work only with registers, while new expands could process memory
>>>> operand as well).
>>>>
>>>> Some code were added to allow generation of MOVQ (with SSE-registers)
>>>> - such moves aren't usual ones, because they use only half of
>>>> xmm-register.
>>>> There was a need to generate such moves explicitly, so I added a
>>>> simple expand to sse.md.
>>>>
>>>>
>>>> On 16 July 2011 03:24, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>>> > New algorithm for move-mode selection is implemented for move_by_pieces,
>>>>>> > store_by_pieces.
>>>>>> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
>>>>>> > similar way, x86 cost-models parameters are slightly changed to support
>>>>>> > this. This implementation checks if array's alignment is known at compile
>>>>>> > time and chooses expanding algorithm and move-mode according to it.
>>>>>
>>>>> Can you give some sumary of changes you made? ?It would make it a lot easier to
>>>>> review if it was broken up int the generic changes (with rationaly why they are
>>>>> needed) and i386 backend changes that I could review then.
>>>>>
>>>>> From first pass through the patch I don't quite see the need for i.e. adding
>>>>> new move patterns when we can output all kinds of SSE moves already. ?Will look
>>>>> more into the patch to see if I can come up with useful comments.
>>>>>
>>>>> Honza
>>>>>
>>>>
>>>
>>> --
>>> ---
>>> Best regards,
>>> Michael V. Zolotukhin,
>>> Software Engineer
>>> Intel Corporation.
>>>
>>
>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.
>>
>
>
>
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.
>



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Attachment: memfunc-tests.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]