This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Optimize %k register comparison against zero (PR target/82855)
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Vladimir Makarov <vmakarov at redhat dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 7 Nov 2017 09:40:31 +0100
- Subject: Re: [PATCH] Optimize %k register comparison against zero (PR target/82855)
- Authentication-results: sourceware.org; auth=none
- References: <20171106212324.GZ14653@tucnak> <CAFULd4YacA_M8URSu7Pm4oNt+ONnXi97yaQL0t4XpG0BWgRSgg@mail.gmail.com> <20171107081827.GF14653@tucnak>
On Tue, Nov 7, 2017 at 9:18 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Nov 06, 2017 at 11:45:37PM +0100, Uros Bizjak wrote:
>> On Mon, Nov 6, 2017 at 10:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > Hi!
>> >
>> > Without the following patch we emit kmovb %k1, %eax; testb %al, %al
>> > when if just testing the Zero bit we can as well do ktestb %k1, %k1.
>> >
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >
>> > 2017-11-06 Jakub Jelinek <jakub@redhat.com>
>> >
>> > PR target/82855
>> > * config/i386/i386.md (SWI1248_AVX512BWDQ2_64): New mode iterator.
>> > (*cmp<mode>_ccz_1): New insn with k alternative.
>> >
>> > * gcc.target/i386/avx512dq-pr82855.c: New test.
>> >
>> > --- gcc/config/i386/i386.md.jj 2017-10-28 09:00:44.000000000 +0200
>> > +++ gcc/config/i386/i386.md 2017-11-06 12:57:44.171215745 +0100
>> > @@ -1275,6 +1275,26 @@ (define_expand "cmp<mode>_1"
>> > (compare:CC (match_operand:SWI48 0 "nonimmediate_operand")
>> > (match_operand:SWI48 1 "<general_operand>")))])
>> >
>> > +(define_mode_iterator SWI1248_AVX512BWDQ2_64
>> > + [(QI "TARGET_AVX512DQ") (HI "TARGET_AVX512DQ")
>> > + (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_64BIT")])
>> > +
>> > +(define_insn "*cmp<mode>_ccz_1"
>> > + [(set (reg FLAGS_REG)
>> > + (compare (match_operand:SWI1248_AVX512BWDQ2_64 0
>> > + "nonimmediate_operand" "<r>,k,?m<r>")
>>
>> Please put "k" alternative as the last alternative. I suggest to
>> decorate it with "*", so register allocator won't use mask registers
>> instead of integer registers. This would be the same solution as in
>> movdi_internal/movsi_internal patterns.
>
> Tried that, but the testcase no longer uses the ktest (diff from without
> the * decoration to with the * decoration, pseudo 93 is live just in
> (insn 12 10 14 2 (set (reg:QI 93)
> (unspec:QI [
> (reg:V8SI 97)
> (reg:V8SI 98)
> ] UNSPEC_MASKED_EQ)) "include/avx512vlintrin.h":5394 3438 {avx512vl_eqv8si3_1}
> (expr_list:REG_DEAD (reg:V8SI 98)
> (expr_list:REG_DEAD (reg:V8SI 97)
> (nil))))
> (insn 14 12 15 2 (set (reg:CCZ 17 flags)
> (compare:CCZ (reg:QI 93)
> (const_int 0 [0]))) "avx512dq-pr82855.c":13 1 {*cmpqi_ccz_1}
> (expr_list:REG_DEAD (reg:QI 93)
> (nil)))
> ):
> - a1 (r93,l0) best MASK_EVEX_REGS, allocno MASK_EVEX_REGS
> + a1 (r93,l0) best GENERAL_REGS, allocno GENERAL_REGS
>
> - a1(r93,l0) costs: AREG:2000,2000 DREG:2000,2000 CREG:2000,2000 BREG:2000,2000 SIREG:2000,2000 DIREG:2000,2000 AD_REGS:2000,2000 CLOBBERED_REGS:2000,2000 Q_REGS:2000,2000 NON_Q_REGS:2000,2000 TLS_GOTBASE_REGS:2000,2000 GENERAL_REGS:2000,2000 MASK_EVEX_REGS:0,0 MASK_REGS:2000,2000 ALL_REGS:420000,420000 MEM:10000,10000
> + a1(r93,l0) costs: AREG:2000,2000 DREG:2000,2000 CREG:2000,2000 BREG:2000,2000 SIREG:2000,2000 DIREG:2000,2000 AD_REGS:2000,2000 CLOBBERED_REGS:2000,2000 Q_REGS:2000,2000 NON_Q_REGS:2000,2000 TLS_GOTBASE_REGS:2000,2000 GENERAL_REGS:2000,2000 MASK_EVEX_REGS:2000,2000 MASK_REGS:4000,4000 ALL_REGS:420000,420000 MEM:10000,10000
>
> - r93: preferred MASK_EVEX_REGS, alternative GENERAL_REGS, allocno ALL_REGS
> + r93: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
>
> - a1(r93,l0) costs: AREG:4000,4000 DREG:4000,4000 CREG:4000,4000 BREG:4000,4000 SIREG:4000,4000 DIREG:4000,4000 AD_REGS:4000,4000 CLOBBERED_REGS:4000,4000 Q_REGS:4000,4000 NON_Q_REGS:4000,4000 TLS_GOTBASE_REGS:4000,4000 GENERAL_REGS:4000,4000 MASK_EVEX_REGS:0,0 MASK_REGS:2000,2000 ALL_REGS:420000,420000 MEM:10000,10000
> - a2(r97,l0) costs: SSE_FIRST_REG:6000,6000 NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 EVEX_SSE_REGS:6000,6000 ALL_SSE_REGS:6000,6000 MEM:25000,25000
> - a3(r98,l0) costs: SSE_FIRST_REG:0,0 NO_REX_SSE_REGS:0,0 SSE_REGS:0,0 EVEX_SSE_REGS:0,0 ALL_SSE_REGS:0,0 MEM:19000,19000
> + a1(r93,l0) costs: GENERAL_REGS:4000,4000 MASK_EVEX_REGS:4000,4000 MASK_REGS:6000,6000 ALL_REGS:422000,422000 MEM:12000,12000
> + a2(r97,l0) costs: SSE_FIRST_REG:8000,8000 NO_REX_SSE_REGS:8000,8000 SSE_REGS:8000,8000 EVEX_SSE_REGS:8000,8000 ALL_SSE_REGS:8000,8000 MEM:27000,27000
> + a3(r98,l0) costs: SSE_FIRST_REG:2000,2000 NO_REX_SSE_REGS:2000,2000 SSE_REGS:2000,2000 EVEX_SSE_REGS:2000,2000 ALL_SSE_REGS:2000,2000 MEM:21000,21000
>
> Isn't it sufficient that moves disparage slightly the k alternatives?
> Or are you worried about the case where the pseudo would need to be spilled
> and LRA would choose to reload it into a %kN register?
The later... Perhaps it is possible to prevent unwanted reloads with
?k (or even ??k, we already have ?m<r>, IDK). I have used *k in
zero-extend patterns to prevent unwanted reloads, which is perhaps too
strong, but we have ree pass that eliminates unwanted moves in this
case (*k in move patterns is handled by other regmove elimination
passes).
Uros.