[PATCH] Optimize bit test and * atomics into lock; bt[src] (PR target/49244)
Uros Bizjak
ubizjak@gmail.com
Mon May 2 21:01:00 GMT 2016
On Mon, May 2, 2016 at 6:36 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch adds pattern recognition (see attached testcase on what it e.g.
> can handle) of the i?86/x86_64 lock; bt[src] operations.
> It is too late to do this during or after RTL expansion, so it is done late
> during gimple, by recognizing these sequences in the fold builtins pass,
> turning those into an internal call which represents atomically setting,
> complementing or resetting a bit and remembering the previous value of the
> bit.
>
> The patch doesn't handle (yet) the weirdo handling of memory operands where
> the counter can be actually not just in between 0 and bitsize - 1 of the
> particular mode, but can be much larger and the CPU locates the right memory
> word first, but could be extended to handle that later.
> I'd like to find out if there are other targets that have similar
> instructions in their ISAs, or if x86_64/i686 is the only one.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux (relies on the
> gimple.c patch I've just posted, otherwise the expected number of
> scan-assembler-times would need to be tweaked for the short int cases).
> Ok for trunk?
>
> 2016-05-02 Jakub Jelinek <jakub@redhat.com>
>
> PR target/49244
> * tree-ssa-ccp.c: Include stor-layout.h and optabs-query.h.
> (optimize_atomic_bit_test_and): New function.
> (pass_fold_builtins::execute): Use it.
> * optabs.def (atomic_bit_test_and_set_optab,
> atomic_bit_test_and_complement_optab,
> atomic_bit_test_and_reset_optab): New optabs.
> * internal-fn.def (ATOMIC_BIT_TEST_AND_SET,
> ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_BIT_TEST_AND_RESET): New ifns.
> * builtins.h (expand_ifn_atomic_bit_test_and): New prototype.
> * builtins.c (expand_ifn_atomic_bit_test_and): New function.
> * internal-fn.c (expand_ATOMIC_BIT_TEST_AND_SET,
> expand_ATOMIC_BIT_TEST_AND_COMPLEMENT,
> expand_ATOMIC_BIT_TEST_AND_RESET): New functions.
> * doc/md.texi (atomic_bit_test_and_set@var{mode},
> atomic_bit_test_and_complement@var{mode},
> atomic_bit_test_and_reset@var{mode}): Document.
> * config/i386/sync.md (atomic_bit_test_and_set<mode>,
> atomic_bit_test_and_complement<mode>,
> atomic_bit_test_and_reset<mode>): New expanders.
> (atomic_bit_test_and_set<mode>_1,
> atomic_bit_test_and_complement<mode>_1,
> atomic_bit_test_and_reset<mode>_1): New insns.
>
> * gcc.target/i386/pr49244-1.c: New test.
> * gcc.target/i386/pr49244-2.c: New test.
> +(define_expand "atomic_bit_test_and_set<mode>"
> + [(match_operand:SWI248 0 "register_operand")
> + (match_operand:SWI248 1 "memory_operand")
> + (match_operand:SWI248 2 "nonmemory_operand")
> + (match_operand:SI 3 "const_int_operand") ;; model
> + (match_operand:SI 4 "const_int_operand")]
> + ""
> +{
> + emit_insn (gen_atomic_bit_test_and_set<mode>_1 (operands[1], operands[2],
> + operands[3]));
> + operands[5] = gen_reg_rtx (QImode);
Please don't use operands[N] without corresponding (match_dup N) in
the RTL pattern. Tthe "operands" array is only as long as the last
operand number from the pattern. Just grep the pattern name from
generated insn-emit.c and you will see the problem.
> + ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG),
> + const0_rtx);
> + rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1);
> + if (operands[4] == const0_rtx)
> + result = expand_simple_binop (<MODE>mode, ASHIFT, result,
> + operands[2], operands[0], 0, OPTAB_DIRECT);
> + if (result != operands[0])
> + emit_move_insn (operands[0], result);
> + DONE;
> +})
> +
> +(define_insn "atomic_bit_test_and_set<mode>_1"
> + [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (unspec_volatile:SWI248
> + [(match_operand:SWI248 0 "memory_operand" "+m")
> + (match_operand:SI 2 "const_int_operand")] ;; model
> + UNSPECV_XCHG)
> + (const_int 0)))
> + (set (zero_extract:SWI248 (match_dup 0)
> + (const_int 1)
> + (match_operand:SWI248 1
> + "nonmemory_operand" "<r>N"))
No need for <r> attribute with SWI248, you can use plain "r" above.
<r> makes difference only for QImode.
> + (const_int 1))]
> + ""
> + "lock{%;} %K2bts{<imodesuffix>}\t{%1, %0|%0, %1}")
> +
> +(define_expand "atomic_bit_test_and_complement<mode>"
> + [(match_operand:SWI248 0 "register_operand")
> + (match_operand:SWI248 1 "memory_operand")
> + (match_operand:SWI248 2 "nonmemory_operand")
> + (match_operand:SI 3 "const_int_operand") ;; model
> + (match_operand:SI 4 "const_int_operand")]
> + ""
> +{
> + emit_insn (gen_atomic_bit_test_and_complement<mode>_1 (operands[1],
> + operands[2],
> + operands[3]));
> + operands[5] = gen_reg_rtx (QImode);
> + ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG),
> + const0_rtx);
> + rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1);
> + if (operands[4] == const0_rtx)
> + result = expand_simple_binop (<MODE>mode, ASHIFT, result,
> + operands[2], operands[0], 0, OPTAB_DIRECT);
> + if (result != operands[0])
> + emit_move_insn (operands[0], result);
> + DONE;
> +})
> +
> +(define_insn "atomic_bit_test_and_complement<mode>_1"
> + [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (unspec_volatile:SWI248
> + [(match_operand:SWI248 0 "memory_operand" "+m")
> + (match_operand:SI 2 "const_int_operand")] ;; model
> + UNSPECV_XCHG)
> + (const_int 0)))
> + (set (zero_extract:SWI248 (match_dup 0)
> + (const_int 1)
> + (match_operand:SWI248 1
> + "nonmemory_operand" "<r>N"))
> + (not:SWI248 (zero_extract:SWI248 (match_dup 0)
> + (const_int 1)
> + (match_dup 1))))]
> + ""
> + "lock{%;} %K2btc{<imodesuffix>}\t{%1, %0|%0, %1}")
> +
> +(define_expand "atomic_bit_test_and_reset<mode>"
> + [(match_operand:SWI248 0 "register_operand")
> + (match_operand:SWI248 1 "memory_operand")
> + (match_operand:SWI248 2 "nonmemory_operand")
> + (match_operand:SI 3 "const_int_operand") ;; model
> + (match_operand:SI 4 "const_int_operand")]
> + ""
> +{
> + emit_insn (gen_atomic_bit_test_and_reset<mode>_1 (operands[1], operands[2],
> + operands[3]));
> + operands[5] = gen_reg_rtx (QImode);
> + ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG),
> + const0_rtx);
> + rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1);
> + if (operands[4] == const0_rtx)
> + result = expand_simple_binop (<MODE>mode, ASHIFT, result,
> + operands[2], operands[0], 0, OPTAB_DIRECT);
> + if (result != operands[0])
> + emit_move_insn (operands[0], result);
> + DONE;
> +})
> +
> +(define_insn "atomic_bit_test_and_reset<mode>_1"
> + [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (unspec_volatile:SWI248
> + [(match_operand:SWI248 0 "memory_operand" "+m")
> + (match_operand:SI 2 "const_int_operand")] ;; model
> + UNSPECV_XCHG)
> + (const_int 0)))
> + (set (zero_extract:SWI248 (match_dup 0)
> + (const_int 1)
> + (match_operand:SWI248 1
> + "nonmemory_operand" "<r>N"))
> + (const_int 0))]
> + ""
> + "lock{%;} %K2btr{<imodesuffix>}\t{%1, %0|%0, %1}")
> --- gcc/testsuite/gcc.target/i386/pr49244-1.c.jj 2016-05-02 14:52:56.776814774 +0200
> +++ gcc/testsuite/gcc.target/i386/pr49244-1.c 2016-05-02 12:39:52.126750700 +0200
> @@ -0,0 +1,188 @@
> +/* PR target/49244 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void bar (void);
> +
> +__attribute__((noinline, noclone)) int
> +f1 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + return (__sync_fetch_and_or (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f2 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + unsigned int t1 = __atomic_fetch_or (a, mask, __ATOMIC_RELAXED);
> + unsigned int t2 = t1 & mask;
> + return t2 != 0;
> +}
> +
> +__attribute__((noinline, noclone)) long int
> +f3 (long int *a, int bit)
> +{
> + unsigned long int mask = (1ul << bit);
> + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) == 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f4 (int *a)
> +{
> + unsigned int mask = (1u << 7);
> + return (__sync_fetch_and_or (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f5 (int *a)
> +{
> + unsigned int mask = (1u << 13);
> + return (__atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f6 (int *a)
> +{
> + unsigned int mask = (1u << 0);
> + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +f7 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + if ((__sync_fetch_and_xor (a, mask) & mask) != 0)
> + bar ();
> +}
> +
> +__attribute__((noinline, noclone)) void
> +f8 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + if ((__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) == 0)
> + bar ();
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f9 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f10 (int *a)
> +{
> + unsigned int mask = (1u << 7);
> + return (__sync_fetch_and_xor (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f11 (int *a)
> +{
> + unsigned int mask = (1u << 13);
> + return (__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f12 (int *a)
> +{
> + unsigned int mask = (1u << 0);
> + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f13 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + return (__sync_fetch_and_and (a, ~mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f14 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f15 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f16 (int *a)
> +{
> + unsigned int mask = (1u << 7);
> + return (__sync_fetch_and_and (a, ~mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f17 (int *a)
> +{
> + unsigned int mask = (1u << 13);
> + return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f18 (int *a)
> +{
> + unsigned int mask = (1u << 0);
> + return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f19 (unsigned long int *a, int bit)
> +{
> + unsigned long int mask = (1ul << bit);
> + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f20 (unsigned long int *a)
> +{
> + unsigned long int mask = (1ul << 7);
> + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) == 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f21 (int *a, int bit)
> +{
> + unsigned int mask = (1u << bit);
> + return (__sync_fetch_and_or (a, mask) & mask);
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f22 (unsigned long int *a)
> +{
> + unsigned long int mask = (1ul << 7);
> + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask);
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f23 (unsigned long int *a)
> +{
> + unsigned long int mask = (1ul << 7);
> + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask);
> +}
> +
> +__attribute__((noinline, noclone)) unsigned short int
> +f24 (unsigned short int *a)
> +{
> + unsigned short int mask = (1u << 7);
> + return (__sync_fetch_and_or (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) unsigned short int
> +f25 (unsigned short int *a)
> +{
> + unsigned short int mask = (1u << 7);
> + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 9 } } */
> +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 10 } } */
> +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 6 } } */
> --- gcc/testsuite/gcc.target/i386/pr49244-2.c.jj 2016-05-02 12:51:51.501983254 +0200
> +++ gcc/testsuite/gcc.target/i386/pr49244-2.c 2016-05-02 14:47:30.240202019 +0200
> @@ -0,0 +1,109 @@
> +/* PR target/49244 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -g" } */
Probably no need for -g in the testcase?
> +/* { dg-additional-options "-march=i486" { target ia32 } } */
Hm, a "dg-do run" testcase with -march is kind of dangerous without a
runtime test for requested architecture. OTOH, do we really need
-march here?
Uros.
More information about the Gcc-patches
mailing list