This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Attached is a part 3 of patch that enables use of vector-instructions in memset and memcopy. This part contains only tests for different memset/memcopy cases. On 28 September 2011 14:57, Michael Zolotukhin <michael.v.zolotukhin@gmail.com> wrote: > Attached is a part 2 of patch that enables use of vector-instructions > in memset and memcopy (back-end part). > > The main part of the changes is in functions > ix86_expand_setmem/ix86_expand_movmem. The other changes are only > needed to support it. > The changes mostly touched unrolled_loop strategy – now vector move > modes could be used here. That resulted in large epilogues and > prologues, so their generation also was modified. > This patch contains some changes in middle-end (to make build > possible) - but all these changes are present in the first part of > patch, so there is no need to review them here. > > Build and 'make check' was tested. > > > On 28 September 2011 14:56, Michael Zolotukhin > <michael.v.zolotukhin@gmail.com> wrote: >> Attached is a part 1 of patch that enables use of vector-instructions >> in memset and memcopy (middle-end part). >> The main part of the changes is in functions >> move_by_pieces/set_by_pieces. In new version algorithm of move-mode >> selection was changed – now it checks if alignment is known at compile >> time and uses cost-models to choose between aligned and unaligned >> vector or not-vector move-modes. >> >> Build and 'make check' was tested - in 'make check' there is a fail, >> that would be cured when complete patch is applied. >> >> On 27 September 2011 18:44, Michael Zolotukhin >> <michael.v.zolotukhin@gmail.com> wrote: >>> I divided the patch into three smaller ones: >>> >>> 1) Patch with target-independent changes (see attached file memfunc-mid.patch). >>> The main part of the changes is in functions >>> move_by_pieces/set_by_pieces. In new version algorithm of move-mode >>> selection was changed – now it checks if alignment is known at compile >>> time and uses cost-models to choose between aligned and unaligned >>> vector or not-vector move-modes. >>> >>> 2) Patch with target-dependent changes (memfunc-be.patch). >>> The main part of the changes is in functions >>> ix86_expand_setmem/ix86_expand_movmem. The other changes are only >>> needed to support it. >>> The changes mostly touched unrolled_loop strategy – now vector move >>> modes could be used here. That resulted in large epilogues and >>> prologues, so their generation also was modified. >>> This patch contains some changes in middle-end (to make build >>> possible) - but all these changes are present in the first patch, so >>> there is no need to review them here. >>> >>> 3) Patch with all new tests (memfunc-tests.patch). >>> This patch contains a lot of small tests for different memset and memcopy cases. >>> >>> Separately from each other, these patches won't give performance gain. >>> The positive effect will be noticeable only if they are applied >>> together (I attach the complete patch also - see file >>> memfunc-complete.patch). >>> >>> >>> If you have any questions regarding these changes, please don't >>> hesitate to ask them. >>> >>> >>> On 18 July 2011 15:00, Michael Zolotukhin >>> <michael.v.zolotukhin@gmail.com> wrote: >>>> Here is a summary - probably, it doesn't cover every single piece in >>>> the patch, but I tried to describe the major changes. I hope this will >>>> help you a bit - and of course I'll answer your further questions if >>>> they appear. >>>> >>>> The changes could be logically divided into two parts (though, these >>>> parts have something in common). >>>> The first part is changes in target-independent part, in functions >>>> move_by_pieces() and store_by_pieces() - mostly located in expr.c. >>>> The second part touches ix86_expand_movmem() and ix86_expand_setmem() >>>> - mostly located in config/i386/i386.c. >>>> >>>> Changes in i386.c (target-dependent part): >>>> 1) Strategies for cases with known and unknown alignment are separated >>>> from each other. >>>> When alignment is known at compile time, we could generate optimized >>>> code without libcalls. >>>> When it's unknown, we sometimes could create runtime-checks to reach >>>> desired alignment, but not always. >>>> Strategies for atom and generic_32, generic_64 were chosen according >>>> to set of experiments, strategies in other >>>> cost models are unchanged (strategies for unknown alignment are copied >>>> from existing strategies). >>>> 2) unrolled_loop algorithm was modified - now it uses SSE move-modes, >>>> if they're available. >>>> 3) As size of data, moved in one iteration, greatly increased, and >>>> epilogues became bigger - so some changes were needed in epilogue >>>> generation. In some cases a special loop (not unrolled) is generated >>>> in epilogue to avoid slow copying by bytes (changes in >>>> expand_set_or_movmem_via_loop() and introducing of >>>> expand_set_or_movmem_via_loop_with_iter() is made for these cases). >>>> 4) As bigger alignment might be needed than previously, prologue >>>> generation was also modified. >>>> >>>> Changes in expr.c (target-independent part): >>>> There are two possible strategies now: use of aligned and unaligned >>>> moves. For each of them a cost model was implemented and the choice is >>>> made according to the cost of each option. Move-mode choice is made by >>>> functions widest_mode_for_unaligned_mov() and >>>> widest_mode_for_aligned_mov(). >>>> Cost estimation is implemented in functions compute_aligned_cost() and >>>> compute_unaligned_cost(). >>>> Choice between these two strategies and the generation of moves >>>> themselves are in function move_by_pieces(). >>>> >>>> Function store_by_pieces() calls set_by_pieces_1() instead of >>>> store_by_pieces_1(), if this is memset-case (I needed to introduce >>>> set_by_pieces_1 to separate memset-case from others - >>>> store_by_pieces_1 is sometimes called for strcpy and some other >>>> functions, not only for memset). >>>> >>>> Set_by_pieces_1() estimates costs of aligned and unaligned strategies >>>> (as in move_by_pieces() ) and generates moves for memset. Single move >>>> is generated via >>>> generate_move_with_mode(). If it's called first time, a promoted value >>>> (register, filled with one-byte value of memset argument) is generated >>>> - later calls reuse this value. >>>> >>>> Changes in MD-files: >>>> For generation of promoted values, I made some changes in >>>> promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands >>>> for vec_dup4si and vec_dupv2di were introduced for this too (these >>>> expands differ from corresponding define_insns - existing define_insn >>>> work only with registers, while new expands could process memory >>>> operand as well). >>>> >>>> Some code were added to allow generation of MOVQ (with SSE-registers) >>>> - such moves aren't usual ones, because they use only half of >>>> xmm-register. >>>> There was a need to generate such moves explicitly, so I added a >>>> simple expand to sse.md. >>>> >>>> >>>> On 16 July 2011 03:24, Jan Hubicka <hubicka@ucw.cz> wrote: >>>>>> > New algorithm for move-mode selection is implemented for move_by_pieces, >>>>>> > store_by_pieces. >>>>>> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in >>>>>> > similar way, x86 cost-models parameters are slightly changed to support >>>>>> > this. This implementation checks if array's alignment is known at compile >>>>>> > time and chooses expanding algorithm and move-mode according to it. >>>>> >>>>> Can you give some sumary of changes you made? ?It would make it a lot easier to >>>>> review if it was broken up int the generic changes (with rationaly why they are >>>>> needed) and i386 backend changes that I could review then. >>>>> >>>>> From first pass through the patch I don't quite see the need for i.e. adding >>>>> new move patterns when we can output all kinds of SSE moves already. ?Will look >>>>> more into the patch to see if I can come up with useful comments. >>>>> >>>>> Honza >>>>> >>>> >>> >>> -- >>> --- >>> Best regards, >>> Michael V. Zolotukhin, >>> Software Engineer >>> Intel Corporation. >>> >> >> >> >> -- >> --- >> Best regards, >> Michael V. Zolotukhin, >> Software Engineer >> Intel Corporation. >> > > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation. > -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Attachment:
memfunc-tests.patch
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |