[PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

Jeff Law law@redhat.com
Wed Aug 26 21:23:35 GMT 2020


On Tue, 2020-08-04 at 14:05 +0800, Hongtao Liu via Gcc-patches wrote:
> Update patch.
> 
> There are a lot of avx512 define_insns which lack corresponding memory
> broadcast version, i only add *avx512f_mul<mode>3<mask_name>_bcst and
> *avx512dq_mul<mode>3<mask_name>_bcst in this patch.
> 
> On Fri, Jul 24, 2020 at 10:37 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > On Thu, Jul 23, 2020 at 9:53 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > On Thu, Jul 23, 2020 at 4:39 PM Jan Hubicka <hubicka@ucw.cz> wrote:
> > > > Hello,
> > > > sorry for taking so long to get to this.
> > > > > diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> > > > > index 535fc7e981d..8f81d101382 100644
> > > > > --- a/gcc/config/i386/i386-features.c
> > > > > +++ b/gcc/config/i386/i386-features.c
> > > > > @@ -2379,6 +2379,152 @@ make_pass_remove_partial_avx_dependency (gcc::context *ctxt)
> > > > >    return new pass_remove_partial_avx_dependency (ctxt);
> > > > >  }
> > > > > 
> > > > > +/* Replace all one-value const vector that are referenced by SYMBOL_REFs in x
> > > > > +   with embedded broadcast. i.e.transform
> > > > > +
> > > > > +     vpaddq .LC0(%rip), %zmm0, %zmm0
> > > > > +     ret
> > > > > +  .LC0:
> > > > > +    .quad 3
> > > > > +    .quad 3
> > > > > +    .quad 3
> > > > > +    .quad 3
> > > > > +    .quad 3
> > > > > +    .quad 3
> > > > > +    .quad 3
> > > > > +    .quad 3
> > > > > +
> > > > > +    to
> > > > > +
> > > > > +     vpaddq .LC0(%rip){1to8}, %zmm0, %zmm0
> > > > 
> > > > It seems to me that having a special purpose pass for this is bit
> > > > overzelaous.  It seems to me that you can do same pattern matching via
> > > > splitter and fit it into the usual insn splitting pass?
> > > > 
> > > 
> > > From an implementation perspective, there could be lots of work, since
> > > memory embedding broadcast is available for nearly every instruction
> > > in AVX512. And for new added AVX512 instructions, we also need to add
> > > a define_split for them.
> > > 
> > 
> > 


> > +/* For const vector having one duplicated value, there's no need to put
> > > > > +   whole vector in the constant pool when target supports embedded broadcast. */
> > > > > +static unsigned int
> > > > > +constant_pool_broadcast (void)
> > > > > +{
> > > > > +  timevar_push (TV_MACH_DEP);
> > > > > +  rtx_insn *insn;
> > > > > +
> > > > > +  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> > > > > +    {
> > > > > +      if (!INSN_P (insn))
> > > > > +     continue;
> > > > > +
> > > > > +      /* Insns may appear inside a SEQUENCE.  Only check the patterns of
> > > > > +      insns, not any notes that may be attached.  We don't want to mark
> > > > > +      a constant just because it happens to appear in a REG_EQUIV note.  */
Under what circumstances are we seeing a SEQUENCE in the x86 backend?  I'm
surprised we need to handle that case.

So your pass modifies the insn in place, which is fine.  But do we actually
remove the original constant pool entry if it's no longer used?  If not, does
this patch actually save anything (memory bandwidth perhaps?)

Is there an existing pass over the RTL chain where this would work so that it's
more compile-time efficient?

jeff



More information about the Gcc-patches mailing list