This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Use of vector instructions in memmov/memset expanding


Thanks for the answers!

I tried to take into account all the remarks and updated the patch in
accordance with them. Its full version is attached to this letter,
separate middle-end and back-end parts will be in consequent letters.

What about the rest part of the patch? Jan, could you please review it too?


Below there are responses to remarks you made.
> +/* Helper function for expand_set_or_movmem_via_loop.
> +   This function can reuse iter rtx from another loop and don't generate
> +   code for updating the addresses.  */
> +static rtx
> +expand_set_or_movmem_via_loop_with_iter (rtx destmem, rtx srcmem,
> +                                        rtx destptr, rtx srcptr, rtx value,
> +                                        rtx count, rtx iter,
> +                                        enum machine_mode mode, int unroll,
> +                                        int expected_size, bool change_ptrs)
>
> I wrote the original function, but it is not really clear for me what the function
> does now. I.e. what is code for updating addresses and what means reusing iter.
> I guess reusing iter means that we won't start the loop from 0.  Could you
> expand comments a bit more?
I added some comments - please see updated version of the patch.


> -/* Output code to copy at most count & (max_size - 1) bytes from SRC to DEST.  */
> +/* Emit strset instuction.  If RHS is constant, and vector mode will be used,
> +   then move this consatnt to a vector register before emitting strset.  */
> +static void
> +emit_strset (rtx destmem, rtx value,
> +            rtx destptr, enum machine_mode mode, int offset)
>
> This seems to more naturally belong into gen_strset expander?
Corrected.


>        {
> -         if (TARGET_64BIT)
> -           {
> -             dest = adjust_automodify_address_nv (destmem, DImode, destptr, offset);
> -             emit_insn (gen_strset (destptr, dest, value));
> -           }
> -         else
> -           {
> -             dest = adjust_automodify_address_nv (destmem, SImode, destptr, offset);
> -             emit_insn (gen_strset (destptr, dest, value));
> -             dest = adjust_automodify_address_nv (destmem, SImode, destptr, offset + 4);
> -             emit_insn (gen_strset (destptr, dest, value));
> -           }
> -         offset += 8;
> +         if (GET_MODE (destmem) != move_mode)
> +           destmem = change_address (destmem, move_mode, destptr);
> AFAIK change_address is not equilvalent into adjust_automodify_address_nv in a way
> it copies memory aliasing attributes and it is needed to zap them here since stringops
> behaves funily WRT aliaseing.
Fixed.


>   if (max_size > 16)
>     {
>       rtx label = ix86_expand_aligntest (count, 16, true);
>       if (TARGET_64BIT)
>        {
> -         dest = change_address (destmem, DImode, destptr);
> -         emit_insn (gen_strset (destptr, dest, value));
> -         emit_insn (gen_strset (destptr, dest, value));
> +         destmem = change_address (destmem, DImode, destptr);
> +         emit_insn (gen_strset (destptr, destmem, gen_lowpart (DImode,
> +                                                               value)));
> +         emit_insn (gen_strset (destptr, destmem, gen_lowpart (DImode,
> +                                                               value)));
>
> No use for 128bit moves here?
>        }
>       else
>        {
> -         dest = change_address (destmem, SImode, destptr);
> -         emit_insn (gen_strset (destptr, dest, value));
> -         emit_insn (gen_strset (destptr, dest, value));
> -         emit_insn (gen_strset (destptr, dest, value));
> -         emit_insn (gen_strset (destptr, dest, value));
> +         destmem = change_address (destmem, SImode, destptr);
> +         emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +                                                               value)));
> +         emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +                                                               value)));
> +         emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +                                                               value)));
> +         emit_insn (gen_strset (destptr, destmem, gen_lowpart (SImode,
> +                                                               value)));
>
> And here?
Fixed.


> @@ -21204,8 +21426,8 @@ expand_movmem_prologue (rtx destmem, rtx srcmem,
>   if (align <= 1 && desired_alignment > 1)
>     {
>       rtx label = ix86_expand_aligntest (destptr, 1, false);
> -      srcmem = change_address (srcmem, QImode, srcptr);
> -      destmem = change_address (destmem, QImode, destptr);
> +      srcmem = adjust_automodify_address_1 (srcmem, QImode, srcptr, 0, 1);
> +      destmem = adjust_automodify_address_1 (destmem, QImode, destptr, 0, 1);
>
> You want to always use adjust_automodify_address or adjust_automodify_address_nv,
> adjust_automodify_address_1 is not intended for general use.
Fixed.


> +/* Target hook.  Returns rtx of mode MODE with promoted value VAL, that is
> +   supposed to represent one byte.  MODE could be a vector mode.
> +   Example:
> +   1) VAL = const_int (0xAB), mode = SImode,
> +   the result is const_int (0xABABABAB).
>
> This can be handled in machine independent way, right?

> Certainly it can be done machine-independently.
> See expand_vector_broadcast in optabs.c for a start.
Thanks, there is no need in new hook, indeed. Fixed.

Responses to other questions/remarks were in previous letter.

On 28 October 2011 19:59, Richard Henderson <rth@redhat.com> wrote:
> On 10/28/2011 05:41 AM, Michael Zolotukhin wrote:
>>> > +/* Target hook. ?Returns rtx of mode MODE with promoted value VAL, that is
>>> > + ? supposed to represent one byte. ?MODE could be a vector mode.
>>> > + ? Example:
>>> > + ? 1) VAL = const_int (0xAB), mode = SImode,
>>> > + ? the result is const_int (0xABABABAB).
>>> >
>>> > This can be handled in machine independent way, right?
>>> >
>>> > + ? 2) if VAL isn't const, then the result will be the result of MUL-instruction
>>> > + ? of VAL and const_int (0x01010101) (for SImode). ?*/
>>> >
>>> > This would probably go better as named expansion pattern, like we do for other
>>> > machine description interfaces.
>> I don't think it could be done in machine-independent way - e.g. if
>> AVX is available, we could use broadcast-instructions, if not - we
>> need to use multiply-instructions, on other architectures there
>> probably some other more efficient ways to duplicate byte value across
>> the entire vector register. So IMO it's a good place to have a hook.
>>
>>
>
> Certainly it can be done machine-independently.
> See expand_vector_broadcast in optabs.c for a start.
>
>
> r~
>



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Attachment: memfunc-complete-4.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]