This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize %k register comparison against zero (PR target/82855)


On Tue, Nov 7, 2017 at 9:18 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Nov 06, 2017 at 11:45:37PM +0100, Uros Bizjak wrote:
>> On Mon, Nov 6, 2017 at 10:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > Hi!
>> >
>> > Without the following patch we emit kmovb %k1, %eax; testb %al, %al
>> > when if just testing the Zero bit we can as well do ktestb %k1, %k1.
>> >
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >
>> > 2017-11-06  Jakub Jelinek  <jakub@redhat.com>
>> >
>> >         PR target/82855
>> >         * config/i386/i386.md (SWI1248_AVX512BWDQ2_64): New mode iterator.
>> >         (*cmp<mode>_ccz_1): New insn with k alternative.
>> >
>> >         * gcc.target/i386/avx512dq-pr82855.c: New test.
>> >
>> > --- gcc/config/i386/i386.md.jj  2017-10-28 09:00:44.000000000 +0200
>> > +++ gcc/config/i386/i386.md     2017-11-06 12:57:44.171215745 +0100
>> > @@ -1275,6 +1275,26 @@ (define_expand "cmp<mode>_1"
>> >         (compare:CC (match_operand:SWI48 0 "nonimmediate_operand")
>> >                     (match_operand:SWI48 1 "<general_operand>")))])
>> >
>> > +(define_mode_iterator SWI1248_AVX512BWDQ2_64
>> > +  [(QI "TARGET_AVX512DQ") (HI "TARGET_AVX512DQ")
>> > +   (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_64BIT")])
>> > +
>> > +(define_insn "*cmp<mode>_ccz_1"
>> > +  [(set (reg FLAGS_REG)
>> > +       (compare (match_operand:SWI1248_AVX512BWDQ2_64 0
>> > +                       "nonimmediate_operand" "<r>,k,?m<r>")
>>
>> Please put "k" alternative as the last alternative. I suggest to
>> decorate it with "*", so register allocator won't use mask registers
>> instead of integer registers. This would be the same solution as in
>> movdi_internal/movsi_internal patterns.
>
> Tried that, but the testcase no longer uses the ktest (diff from without
> the * decoration to with the * decoration, pseudo 93 is live just in
> (insn 12 10 14 2 (set (reg:QI 93)
>         (unspec:QI [
>                 (reg:V8SI 97)
>                 (reg:V8SI 98)
>             ] UNSPEC_MASKED_EQ)) "include/avx512vlintrin.h":5394 3438 {avx512vl_eqv8si3_1}
>      (expr_list:REG_DEAD (reg:V8SI 98)
>         (expr_list:REG_DEAD (reg:V8SI 97)
>             (nil))))
> (insn 14 12 15 2 (set (reg:CCZ 17 flags)
>         (compare:CCZ (reg:QI 93)
>             (const_int 0 [0]))) "avx512dq-pr82855.c":13 1 {*cmpqi_ccz_1}
>      (expr_list:REG_DEAD (reg:QI 93)
>         (nil)))
> ):
> -    a1 (r93,l0) best MASK_EVEX_REGS, allocno MASK_EVEX_REGS
> +    a1 (r93,l0) best GENERAL_REGS, allocno GENERAL_REGS
>
> -  a1(r93,l0) costs: AREG:2000,2000 DREG:2000,2000 CREG:2000,2000 BREG:2000,2000 SIREG:2000,2000 DIREG:2000,2000 AD_REGS:2000,2000 CLOBBERED_REGS:2000,2000 Q_REGS:2000,2000 NON_Q_REGS:2000,2000 TLS_GOTBASE_REGS:2000,2000 GENERAL_REGS:2000,2000 MASK_EVEX_REGS:0,0 MASK_REGS:2000,2000 ALL_REGS:420000,420000 MEM:10000,10000
> +  a1(r93,l0) costs: AREG:2000,2000 DREG:2000,2000 CREG:2000,2000 BREG:2000,2000 SIREG:2000,2000 DIREG:2000,2000 AD_REGS:2000,2000 CLOBBERED_REGS:2000,2000 Q_REGS:2000,2000 NON_Q_REGS:2000,2000 TLS_GOTBASE_REGS:2000,2000 GENERAL_REGS:2000,2000 MASK_EVEX_REGS:2000,2000 MASK_REGS:4000,4000 ALL_REGS:420000,420000 MEM:10000,10000
>
> -    r93: preferred MASK_EVEX_REGS, alternative GENERAL_REGS, allocno ALL_REGS
> +    r93: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
>
> -  a1(r93,l0) costs: AREG:4000,4000 DREG:4000,4000 CREG:4000,4000 BREG:4000,4000 SIREG:4000,4000 DIREG:4000,4000 AD_REGS:4000,4000 CLOBBERED_REGS:4000,4000 Q_REGS:4000,4000 NON_Q_REGS:4000,4000 TLS_GOTBASE_REGS:4000,4000 GENERAL_REGS:4000,4000 MASK_EVEX_REGS:0,0 MASK_REGS:2000,2000 ALL_REGS:420000,420000 MEM:10000,10000
> -  a2(r97,l0) costs: SSE_FIRST_REG:6000,6000 NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 EVEX_SSE_REGS:6000,6000 ALL_SSE_REGS:6000,6000 MEM:25000,25000
> -  a3(r98,l0) costs: SSE_FIRST_REG:0,0 NO_REX_SSE_REGS:0,0 SSE_REGS:0,0 EVEX_SSE_REGS:0,0 ALL_SSE_REGS:0,0 MEM:19000,19000
> +  a1(r93,l0) costs: GENERAL_REGS:4000,4000 MASK_EVEX_REGS:4000,4000 MASK_REGS:6000,6000 ALL_REGS:422000,422000 MEM:12000,12000
> +  a2(r97,l0) costs: SSE_FIRST_REG:8000,8000 NO_REX_SSE_REGS:8000,8000 SSE_REGS:8000,8000 EVEX_SSE_REGS:8000,8000 ALL_SSE_REGS:8000,8000 MEM:27000,27000
> +  a3(r98,l0) costs: SSE_FIRST_REG:2000,2000 NO_REX_SSE_REGS:2000,2000 SSE_REGS:2000,2000 EVEX_SSE_REGS:2000,2000 ALL_SSE_REGS:2000,2000 MEM:21000,21000
>
> Isn't it sufficient that moves disparage slightly the k alternatives?
> Or are you worried about the case where the pseudo would need to be spilled
> and LRA would choose to reload it into a %kN register?

The later... Perhaps it is possible to prevent unwanted reloads with
?k (or even ??k, we already have ?m<r>, IDK). I have used *k in
zero-extend patterns to prevent unwanted reloads, which is perhaps too
strong, but we have ree pass that eliminates unwanted moves in this
case (*k in move patterns is handled by other regmove elimination
passes).

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]