This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Use of vector instructions in memmov/memset expanding

From: Michael Zolotukhin <michael dot v dot zolotukhin at gmail dot com>
To: Jan Hubicka <hubicka at ucw dot cz>
Cc: gcc-patches at gcc dot gnu dot org, Richard Guenther <richard dot guenther at gmail dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, izamyatin at gmail dot com
Date: Mon, 18 Jul 2011 15:00:51 +0400
Subject: Re: Use of vector instructions in memmov/memset expanding
References: <CANtU07-DAOMe9Nk4oYj3FJnkZqgkHvSnobsugeSfcRUzDChrrg@mail.gmail.com> <CANtU07_ZoRrLjWBGv=r6MCeBVTh-z13Cab0frjQdg2e7VAyzGg@mail.gmail.com> <20110715232425.GA24793@atrey.karlin.mff.cuni.cz>

Here is a summary - probably, it doesn't cover every single piece in
the patch, but I tried to describe the major changes. I hope this will
help you a bit - and of course I'll answer your further questions if
they appear.

The changes could be logically divided into two parts (though, these
parts have something in common).
The first part is changes in target-independent part, in functions
move_by_pieces() and store_by_pieces() - mostly located in expr.c.
The second part touches ix86_expand_movmem() and ix86_expand_setmem()
- mostly located in config/i386/i386.c.

Changes in i386.c (target-dependent part):
1) Strategies for cases with known and unknown alignment are separated
from each other.
When alignment is known at compile time, we could generate optimized
code without libcalls.
When it's unknown, we sometimes could create runtime-checks to reach
desired alignment, but not always.
Strategies for atom and generic_32, generic_64 were chosen according
to set of experiments, strategies in other
cost models are unchanged (strategies for unknown alignment are copied
from existing strategies).
2) unrolled_loop algorithm was modified - now it uses SSE move-modes,
if they're available.
3) As size of data, moved in one iteration, greatly increased, and
epilogues became bigger - so some changes were needed in epilogue
generation. In some cases a special loop (not unrolled) is generated
in epilogue to avoid slow copying by bytes (changes in
expand_set_or_movmem_via_loop() and introducing of
expand_set_or_movmem_via_loop_with_iter() is made for these cases).
4) As bigger alignment might be needed than previously, prologue
generation was also modified.

Changes in expr.c (target-independent part):
There are two possible strategies now: use of aligned and unaligned
moves. For each of them a cost model was implemented and the choice is
made according to the cost of each option. Move-mode choice is made by
functions widest_mode_for_unaligned_mov() and
widest_mode_for_aligned_mov().
Cost estimation is implemented in functions compute_aligned_cost() and
compute_unaligned_cost().
Choice between these two strategies and the generation of moves
themselves are in function move_by_pieces().

Function store_by_pieces() calls set_by_pieces_1() instead of
store_by_pieces_1(), if this is memset-case (I needed to introduce
set_by_pieces_1 to separate memset-case from others -
store_by_pieces_1 is sometimes called for strcpy and some other
functions, not only for memset).

Set_by_pieces_1() estimates costs of aligned and unaligned strategies
(as in move_by_pieces() ) and generates moves for memset. Single move
is generated via
generate_move_with_mode(). If it's called first time, a promoted value
(register, filled with one-byte value of memset argument) is generated
- later calls reuse this value.

Changes in MD-files:
For generation of promoted values, I made some changes in
promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands
for vec_dup4si and vec_dupv2di were introduced for this too (these
expands differ from corresponding define_insns - existing define_insn
work only with registers, while new expands could process memory
operand as well).

Some code were added to allow generation of MOVQ (with SSE-registers)
- such moves aren't usual ones, because they use only half of
xmm-register.
There was a need to generate such moves explicitly, so I added a
simple expand to sse.md.

On 16 July 2011 03:24, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > New algorithm for move-mode selection is implemented for move_by_pieces,
>> > store_by_pieces.
>> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
>> > similar way, x86 cost-models parameters are slightly changed to support
>> > this. This implementation checks if array's alignment is known at compile
>> > time and chooses expanding algorithm and move-mode according to it.
>
> Can you give some sumary of changes you made? ?It would make it a lot easier to
> review if it was broken up int the generic changes (with rationaly why they are
> needed) and i386 backend changes that I could review then.
>
> From first pass through the patch I don't quite see the need for i.e. adding
> new move patterns when we can output all kinds of SSE moves already. ?Will look
> more into the patch to see if I can come up with useful comments.
>
> Honza
>

Follow-Ups:
- Re: Use of vector instructions in memmov/memset expanding
  - From: Michael Zolotukhin

References:
- Re: Use of vector instructions in memmov/memset expanding
  - From: Michael Zolotukhin
- Re: Use of vector instructions in memmov/memset expanding
  - From: Jan Hubicka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]