[PATCH] move x86 to use gather/scatter internal functions

Richard Biener rguenther@suse.de
Wed Aug 18 11:31:11 GMT 2021


On Wed, 18 Aug 2021, Richard Biener wrote:

> On Wed, 18 Aug 2021, Richard Biener wrote:
> 
> > 
> > So in the end I seem to be able to combine AVX & AVX512 arriving
> > at the following which passes basic testing.  I will now see to
> > teach the vectorizer the required "promotion" to handle
> > mask_gather_loadv4dfv4si and mask_gather_loadv4sfv4di.
> > 
> > Meanwhile, do you see any hole in the below?  If not I'll
> > do mask_scatter_store accordingly (that will be simpler since
> > AVX doesn't have scatter).
> 
> There seems to be one more complication ... we have
> 
> (define_expand "avx2_gatherdi<mode>"
>   [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
>                    (unspec:VEC_GATHER_MODE
>                      [(match_operand:<VEC_GATHER_SRCDI> 1 
> "register_operand")
>                       (mem:<ssescalarmode>
>                         (match_par_dup 6
>                           [(match_operand 2 "vsib_address_operand")
>                            (match_operand:<VEC_GATHER_IDXDI>
>                               3 "register_operand")
> 
> but VEC_GATHER_IDXDI is
> 
> (define_mode_attr VEC_GATHER_IDXDI
>                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
>                        (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
>                        (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
>                        (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
> 
> I'd have expected (V4SF "V4DI") for example, or (V8SF "V8DI").
> From the ISA docs I conclude that vgatherqps is not supported for
> a %zmm destination but V8SF and V8DI is a supported combination?
> 
> (define_mode_attr VEC_GATHER_IDXSI
>                       [(V2DI "V4SI") (V4DI "V4SI") (V8DI "V8SI")
>                        (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI")
>                        (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI")
>                        (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")])
> 
> looks like what I expect.  So shouldn't we use a special
> VEC_GATHERDI_MODE iterator instead that drops V16{SI,SF} and adjust
> VEC_GATHER_IDXDI accordingly?  Even
> 
> (define_expand "<avx512>_gatherdi<mode>"
>   [(parallel [(set (match_operand:VI48F 0 "register_operand")
>                    (unspec:VI48F
> 
> "supports" V16SI and V16SF gathers, leading to broken patterns?
> 
> I can test if anything goes wrong fixing VEC_GATHER_IDXDI to
> 
> (define_mode_attr VEC_GATHER_IDXDI
>                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
>                        (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
>                        (V4SI "V4DI") (V8SI "V8DI") (V16SI "V8DI")
>                        (V4SF "V4DI") (V8SF "V8DI") (V16SF "V8DI")]

Hum, it seems to be more complicated (or simpler).  The gatherq
instructions will always fill only half of the SFmode destination
while the gatherd will only use half of the index vector for a DFmode
destination.

So we support mask_gather_loadv2sfv2di for example but the insn
patterns want v4sf/v2di operands in the end (and the upper half
of the register are zeroed).

I think I'm going to use a completely separate set of mode
iterators/attributes for the mask_gather_load patterns.

I have to study the builtin path in the vectorizer a bit in how
it handles this all but I suspect it's closely tied to these
x86 specialities.

Richard.


> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Richard.
> > 
> > From 706d3ac96b384eaad249cc83ec542ec643e21a4c Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguenther@suse.de>
> > Date: Tue, 17 Aug 2021 15:14:38 +0200
> > Subject: [PATCH] move x86 to use gather/scatter internal functions
> > To: gcc-patches@gcc.gnu.org
> > 
> > This moves the x86 backend to use standard pattern names for
> > mask_gather_load and mask_scatter_store rather than using the
> > builtin_{gather,scatter} target hooks.
> > 
> > The patch starts with mask_gather_load convering AVX and AVX512
> > and omits gather_load which doesn't match an actual instruction.
> > The vectorizer will appropriately set up a all-ones mask when
> > necessary.
> > 
> > The vectorizer still lacks support for unpacking, thus
> > mixed size gathers like mask_gather_loadv4dfv4si.
> > 
> > 2021-08-17  Richard Biener  <rguenther@suse.de>
> > 
> > 	* config/i386/sse.md
> > 	(vec_gather_idxsi, vec_gather_idxdi): New lower-case
> > 	mode attributes for VEC_GATHER_IDX{SI,DI}.
> > 	(VEC_GATHER_MODEX): New mode iterator.
> > 	(mask_gather_load<mode><vec_gather_idxsi>): New expander.
> > 	(mask_gather_load<mode><vec_gather_idxdi>): Likewise.
> > ---
> >  gcc/config/i386/sse.md | 53 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 53 insertions(+)
> > 
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 13889687793..6ae826c268a 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -23640,12 +23640,22 @@
> >  		       (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI")
> >  		       (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI")
> >  		       (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")])
> > +(define_mode_attr vec_gather_idxsi
> > +		      [(V2DI "v4si") (V4DI "v4si") (V8DI "v8si")
> > +		       (V2DF "v4si") (V4DF "v4si") (V8DF "v8si")
> > +		       (V4SI "v4si") (V8SI "v8si") (V16SI "v16si")
> > +		       (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")])
> >  
> >  (define_mode_attr VEC_GATHER_IDXDI
> >  		      [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> >  		       (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
> >  		       (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
> >  		       (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
> > +(define_mode_attr vec_gather_idxdi
> > +		      [(V2DI "v2di") (V4DI "v4di") (V8DI "v8di")
> > +		       (V2DF "v2di") (V4DF "v4di") (V8DF "v8di")
> > +		       (V4SI "v2di") (V8SI "v4di") (V16SI "v8di")
> > +		       (V4SF "v2di") (V8SF "v4di") (V16SF "v8di")])
> >  
> >  (define_mode_attr VEC_GATHER_SRCDI
> >  		      [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> > @@ -23653,6 +23663,30 @@
> >  		       (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI")
> >  		       (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")])
> >  
> > +(define_mode_iterator VEC_GATHER_MODEX
> > +  [V2DI V2DF V4DI V4DF V4SI V4SF V8SI V8SF
> > +   (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> > +   (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")])
> > +
> > +(define_expand "mask_gather_load<mode><vec_gather_idxsi>"
> > +  [(match_operand:VEC_GATHER_MODEX 0 "register_operand")
> > +   (match_operand 1 "vsib_address_operand")
> > +   (match_operand:<VEC_GATHER_IDXSI> 2 "register_operand")
> > +   (match_operand:SI 3 "const0_operand")
> > +   (match_operand:SI 4 "const1248_operand")
> > +   (match_operand 5 "register_operand")]
> > +  "TARGET_AVX2 && TARGET_USE_GATHER"
> > +{
> > +  rtx scratch = gen_reg_rtx (<MODE>mode);
> > +  emit_move_insn (scratch, CONST0_RTX(<MODE>mode));
> > +  if (GET_MODE_SIZE (<MODE>mode) != 32 && !TARGET_AVX512VL)
> > +    operands[5] = gen_lowpart_SUBREG (<MODE>mode, operands[5]);
> > +  emit_insn (gen_<avx2_avx512>_gathersi<mode> (operands[0], scratch,
> > +					       operands[1], operands[2],
> > +					       operands[5], operands[4]));
> > +  DONE;
> > +})
> > +
> >  (define_expand "avx2_gathersi<mode>"
> >    [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> >  		   (unspec:VEC_GATHER_MODE
> > @@ -23714,6 +23748,25 @@
> >     (set_attr "prefix" "vex")
> >     (set_attr "mode" "<sseinsnmode>")])
> >  
> > +(define_expand "mask_gather_load<mode><vec_gather_idxdi>"
> > +  [(match_operand:VEC_GATHER_MODEX 0 "register_operand")
> > +   (match_operand 1 "vsib_address_operand")
> > +   (match_operand:<VEC_GATHER_IDXDI> 2 "register_operand")
> > +   (match_operand:SI 3 "const0_operand")
> > +   (match_operand:SI 4 "const1248_operand")
> > +   (match_operand 5 "register_operand")]
> > +  "TARGET_AVX2 && TARGET_USE_GATHER"
> > +{
> > +  rtx scratch = gen_reg_rtx (<MODE>mode);
> > +  emit_move_insn (scratch, CONST0_RTX(<MODE>mode));
> > +  if (GET_MODE_SIZE (<MODE>mode) != 32 && !TARGET_AVX512VL)
> > +    operands[5] = gen_lowpart_SUBREG (<VEC_GATHER_SRCDI>mode, operands[5]);
> > +  emit_insn (gen_<avx2_avx512>_gatherdi<mode> (operands[0], scratch,
> > +					       operands[1], operands[2],
> > +					       operands[5], operands[4]));
> > +  DONE;
> > +})
> > +
> >  (define_expand "avx2_gatherdi<mode>"
> >    [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> >  		   (unspec:VEC_GATHER_MODE
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


More information about the Gcc-patches mailing list