[PATCH v4] Add QI vector mode support to by-pieces for memset

H.J. Lu hjl.tools@gmail.com
Fri Jul 30 12:38:45 GMT 2021


On Fri, Jul 30, 2021 at 2:05 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "H.J. Lu" <hjl.tools@gmail.com> writes:
> > On Tue, Jul 27, 2021 at 8:31 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Mon, Jul 26, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >
> >> > On Mon, Jul 26, 2021 at 3:56 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> > >
> >> > > On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford
> >> > > <richard.sandiford@arm.com> wrote:
> >> > > >
> >> > > > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> >> > > > > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford
> >> > > > > <richard.sandiford@arm.com> wrote:
> >> > > > >>
> >> > > > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> >> > > > >> > +to avoid stack realignment when expanding memset.  The default is
> >> > > > >> > +@code{gen_reg_rtx}.
> >> > > > >> > +@end deftypefn
> >> > > > >> > +
> >> > > > >> >  @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned @var{nunroll}, class loop *@var{loop})
> >> > > > >> >  This target hook returns a new value for the number of times @var{loop}
> >> > > > >> >  should be unrolled. The parameter @var{nunroll} is the number of times
> >> > > > >> > […]
> >> > > > >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
> >> > > > >> >        max_size = STORE_MAX_PIECES + 1;
> >> > > > >> >        while (max_size > 1 && l > 0)
> >> > > > >> >       {
> >> > > > >> > -       scalar_int_mode mode = widest_int_mode_for_size (max_size);
> >> > > > >> > +       /* Since this can be called before virtual registers are ready
> >> > > > >> > +          to use, avoid QI vector mode here.  */
> >> > > > >> > +       fixed_size_mode mode
> >> > > > >> > +         = widest_fixed_size_mode_for_size (max_size, false);
> >> > > > >>
> >> > > > >> I think I might have asked this before, sorry, but: when is that true
> >> > > > >> and why does it matter?
> >> > > > >
> >> > > > > can_store_by_pieces may be called:
> >> > > > >
> >> > > > > value-prof.c:      if (!can_store_by_pieces (val, builtin_memset_read_str,
> >> > > > > value-prof.c:      if (!can_store_by_pieces (val, builtin_memset_read_str,
> >> > > > >
> >> > > > > before virtual registers can be used.   When true is passed to
> >> > > > > widest_fixed_size_mode_for_size,  virtual registers may be used
> >> > > > > to expand memset to broadcast, which leads to ICE.   Since for the
> >> > > > > purpose of can_store_by_pieces, we don't need to expand memset
> >> > > > > to broadcast and pass false here can avoid ICE.
> >> > > >
> >> > > > Ah, I see, thanks.
> >> > > >
> >> > > > That sounds like a problem in the way that the memset const function is
> >> > > > written though.  can_store_by_pieces is just a query function, so I don't
> >> > > > think it should be trying to create new registers for can_store_by_pieces,
> >> > > > even if it could.  At the same time, can_store_by_pieces should make the
> >> > > > same choices as the real expander would.
> >> > > >
> >> > > > I think this means that:
> >> > > >
> >> > > > - gen_memset_broadcast should be inlined into its callers, with the
> >> > > >   builtin_memset_read_str getting the CONST_INT_P case and
> >> > > >   builtin_memset_gen_str getting the variable case.
> >> > > >
> >> > > > - builtin_memset_read_str should then stop at and return the
> >> > > >   gen_const_vec_duplicate when the prev argument is null.
> >> >
> >> > This doesn't work since can_store_by_pieces has
> >> >
> >> >                  cst = (*constfun) (constfundata, nullptr, offset, mode);
> >> >                   if (!targetm.legitimate_constant_p (mode, cst))
> >>
> >> We can add a target hook, targetm.legitimate_memset_constant_p,
> >> which defaults to targetm.legitimate_constant_p.  Will it be acceptable?
> >
> > In the v5 patch,  I changed it to
> >
> >                  cst = (*constfun) (constfundata, nullptr, offset, mode);
> >                   /* All CONST_VECTORs are legitimate if vec_duplicate
> >                      is supported.  */
>
> Maybe “can be loaded” rather than “are legitimate”, since they're

Fixed.

> not necessarily legitimate in the sense of legitimate_constant_p
> (hence the patch).  Also, since we assume elsewhere that
> vec_duplicate is a precondition for picking a vector mode,
> I think we should do the same here (and note that in the comment).

Fixed.

> So…
>
> >                   if (!((memsetp
> >                          && VECTOR_MODE_P (mode)
> >                          && GET_MODE_INNER (mode) == QImode
> >                          && (optab_handler (vec_duplicate_optab, mode)
> >                              != CODE_FOR_nothing))
>
> I think we need only the (memsetp && VECTOR_MODE_P (mode)) check.
>
> This feels a bit of a hack TBH.  I think the same principles apply
> to vectors and integers here: forcing the constant to memory is
> still likely to be an optimisation, but is an extra overhead that
> we should probably account for.

Fixed.

> However, I agree this is probably the most practical way forward
> at the moment.

I sent out the v6 patch.

Thanks.

> Thanks,
> Richard
>
> >                         || targetm.legitimate_constant_p (mode, cst)))
> >                     return 0;
> >
> >> > ix86_legitimate_constant_p only allows 0 or -1 for CONST_VECTOR.
> >> > can_store_by_pieces doesn't work well for vector modes.
> >> >
> >> > > >   Only when prev is nonnull should it go on to call the hook
> >> > > >   and copy the constant to the register that the hook returns.
> >> > >
> >> > > How about keeping gen_memset_broadcast and passing PREV to it:
> >> > >
> >> > >   rtx target;
> >> > >   if (CONST_INT_P (data))
> >> > >     {
> >> > >       rtx const_vec = gen_const_vec_duplicate (mode, data);
> >> > >       if (prev == NULL)
> >> > >         /* Return CONST_VECTOR when called by a query function.  */
> >> > >         target = const_vec;
> >> > >       else
> >> > >         {
> >> > >           /* Use the move expander with CONST_VECTOR.  */
> >> > >           target = targetm.gen_memset_scratch_rtx (mode);
> >> > >           emit_move_insn (target, const_vec);
> >> > >         }
> >> > >     }
> >> > >   else
> >> > >     {
> >> > >       target = targetm.gen_memset_scratch_rtx (mode);
> >> > >       class expand_operand ops[2];
> >> > >       create_output_operand (&ops[0], target, mode);
> >> > >       create_input_operand (&ops[1], data, QImode);
> >> > >       expand_insn (icode, 2, ops);
> >> > >       if (!rtx_equal_p (target, ops[0].value))
> >> > >         emit_move_insn (target, ops[0].value);
> >> > >     }
> >> > >
> >> > > > I admit that's uglier than the current version, but it looks like that's
> >> > > > what the current interface expects.
> >> > > >
> >> > > > Thanks,
> >> > > > Richard
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > H.J.
> >> >
> >> >
> >> >
> >> > --
> >> > H.J.
> >>
> >>
> >>
> >> --
> >> H.J.



-- 
H.J.


More information about the Gcc-patches mailing list