[PATCH v4] Add QI vector mode support to by-pieces for memset

Richard Sandiford richard.sandiford@arm.com
Mon Jul 26 21:53:34 GMT 2021


"H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> > +to avoid stack realignment when expanding memset.  The default is
>> > +@code{gen_reg_rtx}.
>> > +@end deftypefn
>> > +
>> >  @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned @var{nunroll}, class loop *@var{loop})
>> >  This target hook returns a new value for the number of times @var{loop}
>> >  should be unrolled. The parameter @var{nunroll} is the number of times
>> > […]
>> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
>> >        max_size = STORE_MAX_PIECES + 1;
>> >        while (max_size > 1 && l > 0)
>> >       {
>> > -       scalar_int_mode mode = widest_int_mode_for_size (max_size);
>> > +       /* Since this can be called before virtual registers are ready
>> > +          to use, avoid QI vector mode here.  */
>> > +       fixed_size_mode mode
>> > +         = widest_fixed_size_mode_for_size (max_size, false);
>>
>> I think I might have asked this before, sorry, but: when is that true
>> and why does it matter?
>
> can_store_by_pieces may be called:
>
> value-prof.c:      if (!can_store_by_pieces (val, builtin_memset_read_str,
> value-prof.c:      if (!can_store_by_pieces (val, builtin_memset_read_str,
>
> before virtual registers can be used.   When true is passed to
> widest_fixed_size_mode_for_size,  virtual registers may be used
> to expand memset to broadcast, which leads to ICE.   Since for the
> purpose of can_store_by_pieces, we don't need to expand memset
> to broadcast and pass false here can avoid ICE.

Ah, I see, thanks.

That sounds like a problem in the way that the memset const function is
written though.  can_store_by_pieces is just a query function, so I don't
think it should be trying to create new registers for can_store_by_pieces,
even if it could.  At the same time, can_store_by_pieces should make the
same choices as the real expander would.

I think this means that:

- gen_memset_broadcast should be inlined into its callers, with the
  builtin_memset_read_str getting the CONST_INT_P case and
  builtin_memset_gen_str getting the variable case.

- builtin_memset_read_str should then stop at and return the
  gen_const_vec_duplicate when the prev argument is null.
  Only when prev is nonnull should it go on to call the hook
  and copy the constant to the register that the hook returns.

I admit that's uglier than the current version, but it looks like that's
what the current interface expects.

Thanks,
Richard


More information about the Gcc-patches mailing list