[PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

Hongtao Liu crazylht@gmail.com
Thu Aug 20 07:24:42 GMT 2020


On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > > enabled since there's no corresponding instruction for general
> > > > registers.
> > > >
> > > > gcc/
> > > >         PR target/88808
> > > >         * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > >         for mask registers.
> > > >         (*movhi_internal): Ditto.
> > > >         (*movqi_internal): Ditto.
> > > >         (*anddi_1): Support mask register operations
> > > >         (*and<mode>_1): Ditto.
> > > >         (*andqi_1): Ditto.
> > > >         (*andn<mode>_1): Ditto.
> > > >         (*<code><mode>_1): Ditto.
> > > >         (*<code>qi_1): Ditto.
> > > >         (*one_cmpl<mode>2_1): Ditto.
> > > >         (*one_cmplsi2_1_zext): Ditto.
> > > >         (*one_cmplqi2_1): Ditto.
> > > >
> > > > gcc/testsuite/
> > > >         * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > >         * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > >         * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > >         * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > >         * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > >         * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > >
> > > index 74d207c3711..e8ad79d1b0a 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -2294,7 +2294,7 @@
> > >
> > >  (define_insn "*movsi_internal"
> > >    [(set (match_operand:SI 0 "nonimmediate_operand"
> > > -    "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > +    "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > >      (match_operand:SI 1 "general_operand"
> > >      "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > >    "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > >
> > > I'd rather see *k everywhere, also with *movqi_internal and
> > > *movhi_internal patterns. The "*" means that the allocator won't
> > > allocate a mask register by default, but it will be used to optimize
> > > moves. With the above change, you are risking that during integer
> > > register pressure, the register allocator will allocate zero to a mask
> > > register, and later "optimize" the move with a direct maskreg-intreg
> > > move.
> > >
> > > The current strategy is that only general registers get allocated for
> > > integer modes. Let's keep it this way for now.
> > >
> >
> > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > move zero into mask register directly.
>
> Although it would be nice if the register allocator was smart enough,
> the current strategy is to introduce peephole2 patterns to fix these
> problems, similar to [1]. These peepholes can be introduced in a
> follow-up patch.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
>

peephole2 added.

> > > Otherwise, the patchset LGTM, but please test the suggested changes and repost.
> > >
> > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > > used to distinguish mask operations, generated from builtins from
> > > generic operations, so I'd like to keep them for a while. The drawback
> > > is, that they are not combined with other operations, but at the end
> > > of the day, this is what the programmer asked for by using builtins.
> >
> > Agree, I prefer to keep them.
>
> Thinking some more about the approach, it looks to me that the optimal
> solution is a post-reload splitter that would convert "generic"
> patterns to mask operations from sse.md. The mask operations don't set
> flags, so we can substantially improve post reload scheduling of these
> instructions by removing flags clobber.
>
> So, simply add "#" to relevant alternatives of logic patterns and add
> something like:
>
> --cut here--
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 41c6dbfa668..ad49bdc7583 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1470,6 +1470,18 @@
>            ]
>            (const_string "<MODE>")))])
>
> +(define_split
> +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> +       (any_logic:SWI1248_AVX512BW
> +         (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> +         (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F && reload_completed"
> +  [(parallel
> +     [(set (match_dup 0)
> +          (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
> +      (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> +
>  (define_insn "kandn<mode>"
>    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
>         (and:SWI1248_AVX512BW
> --cut here--
>
> and similar for kandn and knot in sse.md. You will have to add
> mask_reg_operand predicate, see e.g. sse_reg_operand in predicates.md
> for example.
>
> We don't lose anything, because all important transformations,
> propagations and simplifications with these patterns happen before
> reload.

define_splits are added for those bitwise operations.

>
> Uros.

Also add bellow part which will pass gcc.target/i386/bitwise_mask_op-3.c

-     must go into Q_REGS.  */
+     must go into Q_REGS or ALL_MASK_REGS.  */
   if (GET_MODE (x) == QImode && !CONSTANT_P (x))
     {
       if (Q_CLASS_P (regclass))
        return regclass;
       else if (reg_class_subset_p (Q_REGS, regclass))
        return Q_REGS;
+      else if (MASK_CLASS_P (regclass))
+       return regclass;
       else
        return NO_REGS;


Update patch.


--
BR,
Hongtao


More information about the Gcc-patches mailing list