[PATCH] Enable GCC support for AVX512_VP2INTERSECT.

H.J. Lu hjl.tools@gmail.com
Fri Jun 7 17:29:00 GMT 2019


On Fri, Jun 7, 2019 at 8:50 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Jun 7, 2019 at 5:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Jun 6, 2019 at 5:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Jun 6, 2019 at 2:12 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 6, 2019 at 7:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > Hi Uros and all:
> > > > >   This patch is about to enable support for AVX512_VP2INTERSECT which will
> > > > > be in Willow Cove. There are two instructions for AVX512_VP2INTERSECT:
> > > > > VP2INTERSECTD and VP2INTERSECTQ. More details please refer to
> > > > > https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> > > > >
> > > > >   Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
> > > > >
> > > > > Changelog:
> > > > >
> > > > > gcc/
> > > > > +2019-06-06  Hongtao Liu  <hongtao.liu@intel.com>
> > > > > +     H.J. Lu  <hongjiu.lu@intel.com>
> > > > > +     Olga Makhotina  <olga.makhotina@intel.com>
> > > > > +
> > > > > + * common/config/i386/i386-common.c
> > > > > + (OPTION_MASK_ISA_AVX512VP2INTERSECT_SET,
> > > > > + OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET): New macros.
> > > > > + (OPTION_MASK_ISA2_AVX512F_UNSET): Add
> > > > > + OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET.
> > > > > + (ix86_handle_option): Handle -mavx512vp2intersect.
> > > > > + * config/i386/avx512vp2intersectintrin.h: New.
> > > > > + * config/i386/avx512vp2intersectvlintrin.h: New.
> > > > > + * config/i386/cpuid.h (bit_AVX512VP2INTERSECT): New.
> > > > > + * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> > > > > + AVX512VP2INTERSECT.
> > > > > + * config/i386/i386-builtin-types.def: Add new types.
> > > > > + * config/i386/i386-builtin.def: Add new builtins.
> > > > > + * config/i386/i386-builtins.c: (enum processor_features): Add
> > > > > + F_AVX512VP2INTERSECT.
> > > > > + (static const _isa_names_table isa_names_table): Ditto.
> > > > > + * config/i386/i386-c.c (ix86_target_macros_internal): Define
> > > > > + __AVX512VP2INTERSECT__.
> > > > > + * config/i386/i386-expand.c (ix86_expand_builtin): Expand
> > > > > + IX86_BUILTIN_2INTERSECTD512, IX86_BUILTIN_2INTERSECTQ512,
> > > > > + IX86_BUILTIN_2INTERSECTD256, IX86_BUILTIN_2INTERSECTQ256,
> > > > > + IX86_BUILTIN_2INTERSECTD128, IX86_BUILTIN_2INTERSECTQ128.
> > > > > + * config/i386/i386-modes.def (P2QI, P2HI): New modes.
> > > > > + * config/i386/i386-options.c (ix86_target_string): Add
> > > > > + -mavx512vp2intersect.
> > > > > + (ix86_option_override_internal): Handle AVX512VP2INTERSECT.
> > > > > + * config/i386/i386.c (ix86_hard_regno_nregs): Allocate two regs for
> > > > > + P2HImode and P2QImode.
> > > > > + (ix86_hard_regno_mode_ok): Register pair only starts at even hardreg
> > > > > + number for P2QImode and P2HImode.
> > > > > + * config/i386/i386.h (TARGET_AVX512VP2INTERSECT,
> > > > > + TARGET_AVX512VP2INTERSECT_P): New.
> > > > > + (PTA_AVX512VP2INTERSECT): Ditto.
> > > > > + * config/i386/i386.opt: Add -mavx512vp2intersect.
> > > > > + * config/i386/immintrin.h: Include avx512vp2intersectintrin.h and
> > > > > + avx512vp2intersectvlintrin.h.
> > > > > + * config/i386/sse.md (define_c_enum "unspec"): Add UNSPEC_VP2INTERSECT.
> > > > > + (define_mode_iterator VI48_AVX512VP2VL): New.
> > > > > + (avx512vp2intersect_2intersect<mode>,
> > > > > + avx512vp2intersect_2intersectv16si): New define_insn patterns.
> > > > > + (*vec_extractp2hi, *vec_extractp2qi): New define_insn_and_split
> > > > > + patterns.
> > > > > + * config.gcc: Add avx512vp2intersectvlintrin.h and
> > > > > + avx512vp2intersectintrin.h to extra_headers.
> > > > > + * doc/invoke.texi: Document -mavx512vp2intersect.
> > > > > +
> > > > >
> > > > > gcc/testsuite/
> > > > > +2019-06-06  Hongtao Liu  <hongtao.liu@intel.com>
> > > > > +     Olga Makhotina  <olga.makhotina@intel.com>
> > > > > +
> > > > > + * gcc.target/i386/avx512-check.h: Handle bit_AVX512VP2INTERSECT.
> > > > > + * gcc.target/i386/avx512vp2intersect-2intersect-1a.c: New test.
> > > > > + * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Likewise.
> > > > > + * gcc.target/i386/avx512vp2intersect-2intersectvl-1a.c: Likewise.
> > > > > + * gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c: Likewise.
> > > > > + * gcc.target/i386/sse-12.c: Add -mavx512vp2intersect.
> > > > > + * gcc.target/i386/sse-13.c: Likewsie.
> > > > > + * gcc.target/i386/sse-14.c: Likewise.
> > > > > + * gcc.target/i386/sse-22.c: Likewise.
> > > > > + * gcc.target/i386/sse-23.c: Likewise.
> > > > > + * g++.dg/other/i386-2.C: Likewise.
> > > > > + * g++.dg/other/i386-3.C: Likewise.
> > > > > +
> > > >
> > > > +    case OPT_mavx512vp2intersect:
> > > > +      if (value)
> > > > +        {
> > > > +          opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX512VP2INTERSECT_SET;
> > > > +          opts->x_ix86_isa_flags2_explicit |=
> > > > OPTION_MASK_ISA_AVX512VP2INTERSECT_SET;
> > > > +  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
> > > > +  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
> > > > +        }
> > > >
> > > > some space/tab mixup here.
> > > >
> > > > +(define_mode_iterator VI48_AVX512VP2VL
> > > > +  [V8DI
> > > > +  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")
> > > > +  (V8SI "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")])
> > > >
> > > > also here (or maybe a vertical alignment issue).
> > > >
> > > > +      op2 = copy_to_reg (op2);
> > > > +      op3 = copy_to_reg (op3);
> > > >
> > > > The predicate says that this one can be memory operand as well. I
> > > > suggest you use
> > > >
> > > > if (!insn_data[icode].operand[X].predicate (opX, modeX))
> > > >   opX = copy_to_mode_reg (modeX, opX);
> > > >
> > > > This would also handle eventual VOIDmode vector 0 operand.
> > > >
> > > > +
> > > > +      op4 = gen_reg_rtx (mode4);
> > > > +      emit_insn (GEN_FCN (icode) (op4, op2, op3));
> > > > +      mode0 = GET_MODE_INNER (GET_MODE (op4));
> > > > +      pat = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, GEN_INT (0)));
> > > > +      pat2 = gen_rtx_VEC_SELECT (mode0, op4, pat);
> > > > +      emit_move_insn (gen_rtx_MEM (mode0, op0), pat2);
> > > > +      pat = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, GEN_INT (1)));
> > > > +      pat2 = gen_rtx_VEC_SELECT (mode0, op4, pat);
> > > > +      emit_move_insn (gen_rtx_MEM (mode0, op1), pat2);
> > > > +
> > > >
> > > > You should probably emit a subreg here (using simplify_gen_subreg) and
> > > > leave to the register allocator to emit correct hard register out of a
> > > > register pair. Using this approach, *vec_extractp2hi and
> > > > *vec_extractp2hi should not be necessary anymore; RA will reduce the
> > > > subreg RTX to a movqi/movhi by itself.
> > >
> > > +/* Register pair.  */
> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 2); /* P2QI */
> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 4); /* P2HI P4QI */
> > >
> > > I think
> > >
> > > INT_MODE (P2QI, 16);
> > > INT_MODE (P2HI, 32);
> > >
> > > with the above subreg approach should work.
> > >
> >
> > I don't think subreg works on pseudo registers with non-zero
> > offset.  validate_subreg has
> >
> >  if (maybe_lt (osize, regsize)
> >       && ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))))
> >     {
> >       /* It is invalid for the target to pick a register size for a mode
> >          that isn't ordered wrt to the size of that mode.  */
> >       poly_uint64 block_size = ordered_min (isize, regsize);
> >       unsigned int start_reg;
> >       poly_uint64 offset_within_reg;
> >       if (!can_div_trunc_p (offset, block_size, &start_reg, &offset_within_reg)
> >           || (BYTES_BIG_ENDIAN
> >               ? maybe_ne (offset_within_reg, block_size - osize)
> >               : maybe_ne (offset_within_reg, 0U)))
> >         return false;
>
> It works with SImode subregs of DImode values on 32bit targets. Please
> look for calls to gen_highpart, one concrete example is in
> atomic_compare_and_swap<mode>.
>

It works because of

#define REGMODE_NATURAL_SIZE(MODE) UNITS_PER_WORD

and only works for the high part of SImode of DImode.

P2QI and P2HI are 2 special modes of mask register pair for
2 instructions.   Do we want to make them more generic?

-- 
H.J.



More information about the Gcc-patches mailing list