[PATCH v5 2/2] x86: Add vec_duplicate<mode> expander

Richard Sandiford richard.sandiford@arm.com
Mon Jun 28 12:36:48 GMT 2021


"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> >> > 1. Update vec_duplicate to allow to fail so that backend can only allow
>> >> > broadcasting an integer constant to a vector when broadcast instruction
>> >> > is available.  This can be used by memset expander to avoid vec_duplicate
>> >> > when loading from constant pool is more efficient.
>> >>
>> >> I don't see any changes in target-independent code though, other than
>> >> the doc update.  It's still the case that (existing) uses of
>> >> vec_duplicate_optab do not allow it to fail.
>> >
>> > I have a followup patch set on
>> >
>> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
>> >
>> > to use it to expand memset with vector broadcast:
>> >
>> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
>> >
>> > For SSE2 which doesn't have vector broadcast, the constant vector broadcast
>> > expander returns FAIL and load from constant pool will be used.
>>
>> Hmm, but as Jeff and I mentioned in the earlier replies,
>> vec_duplicate_optab shouldn't be used for constants.  Constants
>> should go via the move expanders instead.
>>
>> In a previous message I suggested:
>>
>>   … would it work to change:
>>
>>         /* Try using vec_duplicate_optab for uniform vectors.  */
>>         if (!TREE_SIDE_EFFECTS (exp)
>>             && VECTOR_MODE_P (mode)
>>             && eltmode == GET_MODE_INNER (mode)
>>             && ((icode = optab_handler (vec_duplicate_optab, mode))
>>                 != CODE_FOR_nothing)
>>             && (elt = uniform_vector_p (exp)))
>>
>>   to something like:
>>
>>         /* Try using vec_duplicate_optab for uniform vectors.  */
>>         if (!TREE_SIDE_EFFECTS (exp)
>>             && VECTOR_MODE_P (mode)
>>             && eltmode == GET_MODE_INNER (mode)
>>             && (elt = uniform_vector_p (exp)))
>>           {
>>             if (TREE_CODE (elt) == INTEGER_CST
>>                 || TREE_CODE (elt) == POLY_INT_CST
>>                 || TREE_CODE (elt) == REAL_CST
>>                 || TREE_CODE (elt) == FIXED_CST)
>>               {
>>                 rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
>>                 emit_move_insn (target, src);
>>                 break;
>>               }
>>>>           }
>>
>> if that code was the source of the constant operand.  If we're adding a
>> new use of vec_duplicate_optab then that should be similarly protected
>> against constant operands.
>>
>
> Your comments apply to my initial vec_duplicate patch that caused the
> gcc.dg/pr100239.c failure.  It has been fixed by
>
> commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
> Author: Richard Biener <rguenther@suse.de>
> Date:   Mon Jun 7 20:08:13 2021 +0200
>
>     middle-end/100951 - make sure to generate VECTOR_CST in lowering
>
>     When vector lowering creates piecewise ops make sure to create
>     VECTOR_CSTs instead of CONSTRUCTORs when possible.
>
> The problem I am running into now is in my memset vector broadcast
> patch.  In order to optimize vector broadcast for memset, I need to
> generate a pseudo register for
>
>  __builtin_memset (ops, 3, 38);
>
> only when vector broadcast is available:
>
>   rtx target = nullptr;
>
>   unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
>   machine_mode vector_mode;
>   if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
>     gcc_unreachable ();
>
>   enum insn_code icode = optab_handler (vec_duplicate_optab,
>                                         vector_mode);
>   if (icode != CODE_FOR_nothing)
>     {
>       rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
>       class expand_operand ops[2];
>       create_output_operand (&ops[0], reg, vector_mode);
>       create_input_operand (&ops[1], data, QImode);
>       if (maybe_expand_insn (icode, 2, ops))
>         {
>           if (!rtx_equal_p (reg, ops[0].value))
>             emit_move_insn (reg, ops[0].value);
>           target = lowpart_subreg (mode, reg, vector_mode);
>         }
>     }
>
>   return target;  <<< Return nullptr to load from constant pool.

I don't think this is a correct use of vec_duplicate_optab.  If the
scalar operand is a constant then the move should always go through
the move expanders instead, as a move from a CONST_VECTOR.

Thanks,
Richard


More information about the Gcc-patches mailing list