[PATCH] Optimize bit test and * atomics into lock; bt[src] (PR target/49244)

Uros Bizjak ubizjak@gmail.com
Mon May 2 21:01:00 GMT 2016


On Mon, May 2, 2016 at 6:36 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch adds pattern recognition (see attached testcase on what it e.g.
> can handle) of the i?86/x86_64 lock; bt[src] operations.
> It is too late to do this during or after RTL expansion, so it is done late
> during gimple, by recognizing these sequences in the fold builtins pass,
> turning those into an internal call which represents atomically setting,
> complementing or resetting a bit and remembering the previous value of the
> bit.
>
> The patch doesn't handle (yet) the weirdo handling of memory operands where
> the counter can be actually not just in between 0 and bitsize - 1 of the
> particular mode, but can be much larger and the CPU locates the right memory
> word first, but could be extended to handle that later.
> I'd like to find out if there are other targets that have similar
> instructions in their ISAs, or if x86_64/i686 is the only one.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux (relies on the
> gimple.c patch I've just posted, otherwise the expected number of
> scan-assembler-times would need to be tweaked for the short int cases).
> Ok for trunk?
>
> 2016-05-02  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/49244
>         * tree-ssa-ccp.c: Include stor-layout.h and optabs-query.h.
>         (optimize_atomic_bit_test_and): New function.
>         (pass_fold_builtins::execute): Use it.
>         * optabs.def (atomic_bit_test_and_set_optab,
>         atomic_bit_test_and_complement_optab,
>         atomic_bit_test_and_reset_optab): New optabs.
>         * internal-fn.def (ATOMIC_BIT_TEST_AND_SET,
>         ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_BIT_TEST_AND_RESET): New ifns.
>         * builtins.h (expand_ifn_atomic_bit_test_and): New prototype.
>         * builtins.c (expand_ifn_atomic_bit_test_and): New function.
>         * internal-fn.c (expand_ATOMIC_BIT_TEST_AND_SET,
>         expand_ATOMIC_BIT_TEST_AND_COMPLEMENT,
>         expand_ATOMIC_BIT_TEST_AND_RESET): New functions.
>         * doc/md.texi (atomic_bit_test_and_set@var{mode},
>         atomic_bit_test_and_complement@var{mode},
>         atomic_bit_test_and_reset@var{mode}): Document.
>         * config/i386/sync.md (atomic_bit_test_and_set<mode>,
>         atomic_bit_test_and_complement<mode>,
>         atomic_bit_test_and_reset<mode>): New expanders.
>         (atomic_bit_test_and_set<mode>_1,
>         atomic_bit_test_and_complement<mode>_1,
>         atomic_bit_test_and_reset<mode>_1): New insns.
>
>         * gcc.target/i386/pr49244-1.c: New test.
>         * gcc.target/i386/pr49244-2.c: New test.

> +(define_expand "atomic_bit_test_and_set<mode>"
> +  [(match_operand:SWI248 0 "register_operand")
> +   (match_operand:SWI248 1 "memory_operand")
> +   (match_operand:SWI248 2 "nonmemory_operand")
> +   (match_operand:SI 3 "const_int_operand") ;; model
> +   (match_operand:SI 4 "const_int_operand")]
> +  ""
> +{
> +  emit_insn (gen_atomic_bit_test_and_set<mode>_1 (operands[1], operands[2],
> +                                                 operands[3]));
> +  operands[5] = gen_reg_rtx (QImode);

Please don't use operands[N] without corresponding (match_dup N) in
the RTL pattern. Tthe "operands" array is only as long as the last
operand number from the pattern. Just grep the pattern name from
generated insn-emit.c and you will see the problem.

> +  ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG),
> +                    const0_rtx);
> +  rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1);
> +  if (operands[4] == const0_rtx)
> +    result = expand_simple_binop (<MODE>mode, ASHIFT, result,
> +                                 operands[2], operands[0], 0, OPTAB_DIRECT);
> +  if (result != operands[0])
> +    emit_move_insn (operands[0], result);
> +  DONE;
> +})
> +
> +(define_insn "atomic_bit_test_and_set<mode>_1"
> +  [(set (reg:CCC FLAGS_REG)
> +       (compare:CCC
> +         (unspec_volatile:SWI248
> +           [(match_operand:SWI248 0 "memory_operand" "+m")
> +            (match_operand:SI 2 "const_int_operand")]          ;; model
> +           UNSPECV_XCHG)
> +         (const_int 0)))
> +   (set (zero_extract:SWI248 (match_dup 0)
> +                            (const_int 1)
> +                            (match_operand:SWI248 1
> +                              "nonmemory_operand" "<r>N"))

No need for <r> attribute with SWI248, you can use plain "r" above.
<r> makes difference only for QImode.

> +       (const_int 1))]
> +  ""
> +  "lock{%;} %K2bts{<imodesuffix>}\t{%1, %0|%0, %1}")
> +
> +(define_expand "atomic_bit_test_and_complement<mode>"
> +  [(match_operand:SWI248 0 "register_operand")
> +   (match_operand:SWI248 1 "memory_operand")
> +   (match_operand:SWI248 2 "nonmemory_operand")
> +   (match_operand:SI 3 "const_int_operand") ;; model
> +   (match_operand:SI 4 "const_int_operand")]
> +  ""
> +{
> +  emit_insn (gen_atomic_bit_test_and_complement<mode>_1 (operands[1],
> +                                                        operands[2],
> +                                                        operands[3]));
> +  operands[5] = gen_reg_rtx (QImode);
> +  ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG),
> +                    const0_rtx);
> +  rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1);
> +  if (operands[4] == const0_rtx)
> +    result = expand_simple_binop (<MODE>mode, ASHIFT, result,
> +                                 operands[2], operands[0], 0, OPTAB_DIRECT);
> +  if (result != operands[0])
> +    emit_move_insn (operands[0], result);
> +  DONE;
> +})
> +
> +(define_insn "atomic_bit_test_and_complement<mode>_1"
> +  [(set (reg:CCC FLAGS_REG)
> +       (compare:CCC
> +         (unspec_volatile:SWI248
> +           [(match_operand:SWI248 0 "memory_operand" "+m")
> +            (match_operand:SI 2 "const_int_operand")]          ;; model
> +           UNSPECV_XCHG)
> +         (const_int 0)))
> +   (set (zero_extract:SWI248 (match_dup 0)
> +                            (const_int 1)
> +                            (match_operand:SWI248 1
> +                              "nonmemory_operand" "<r>N"))
> +       (not:SWI248 (zero_extract:SWI248 (match_dup 0)
> +                                        (const_int 1)
> +                                        (match_dup 1))))]
> +  ""
> +  "lock{%;} %K2btc{<imodesuffix>}\t{%1, %0|%0, %1}")
> +
> +(define_expand "atomic_bit_test_and_reset<mode>"
> +  [(match_operand:SWI248 0 "register_operand")
> +   (match_operand:SWI248 1 "memory_operand")
> +   (match_operand:SWI248 2 "nonmemory_operand")
> +   (match_operand:SI 3 "const_int_operand") ;; model
> +   (match_operand:SI 4 "const_int_operand")]
> +  ""
> +{
> +  emit_insn (gen_atomic_bit_test_and_reset<mode>_1 (operands[1], operands[2],
> +                                                   operands[3]));
> +  operands[5] = gen_reg_rtx (QImode);
> +  ix86_expand_setcc (operands[5], EQ, gen_rtx_REG (CCCmode, FLAGS_REG),
> +                    const0_rtx);
> +  rtx result = convert_modes (<MODE>mode, QImode, operands[5], 1);
> +  if (operands[4] == const0_rtx)
> +    result = expand_simple_binop (<MODE>mode, ASHIFT, result,
> +                                 operands[2], operands[0], 0, OPTAB_DIRECT);
> +  if (result != operands[0])
> +    emit_move_insn (operands[0], result);
> +  DONE;
> +})
> +
> +(define_insn "atomic_bit_test_and_reset<mode>_1"
> +  [(set (reg:CCC FLAGS_REG)
> +       (compare:CCC
> +         (unspec_volatile:SWI248
> +           [(match_operand:SWI248 0 "memory_operand" "+m")
> +            (match_operand:SI 2 "const_int_operand")]          ;; model
> +           UNSPECV_XCHG)
> +         (const_int 0)))
> +   (set (zero_extract:SWI248 (match_dup 0)
> +                            (const_int 1)
> +                            (match_operand:SWI248 1
> +                              "nonmemory_operand" "<r>N"))
> +       (const_int 0))]
> +  ""
> +  "lock{%;} %K2btr{<imodesuffix>}\t{%1, %0|%0, %1}")
> --- gcc/testsuite/gcc.target/i386/pr49244-1.c.jj        2016-05-02 14:52:56.776814774 +0200
> +++ gcc/testsuite/gcc.target/i386/pr49244-1.c   2016-05-02 12:39:52.126750700 +0200
> @@ -0,0 +1,188 @@
> +/* PR target/49244 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void bar (void);
> +
> +__attribute__((noinline, noclone)) int
> +f1 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  return (__sync_fetch_and_or (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f2 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  unsigned int t1 = __atomic_fetch_or (a, mask, __ATOMIC_RELAXED);
> +  unsigned int t2 = t1 & mask;
> +  return t2 != 0;
> +}
> +
> +__attribute__((noinline, noclone)) long int
> +f3 (long int *a, int bit)
> +{
> +  unsigned long int mask = (1ul << bit);
> +  return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) == 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f4 (int *a)
> +{
> +  unsigned int mask = (1u << 7);
> +  return (__sync_fetch_and_or (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f5 (int *a)
> +{
> +  unsigned int mask = (1u << 13);
> +  return (__atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f6 (int *a)
> +{
> +  unsigned int mask = (1u << 0);
> +  return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +f7 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  if ((__sync_fetch_and_xor (a, mask) & mask) != 0)
> +    bar ();
> +}
> +
> +__attribute__((noinline, noclone)) void
> +f8 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  if ((__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) == 0)
> +    bar ();
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f9 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f10 (int *a)
> +{
> +  unsigned int mask = (1u << 7);
> +  return (__sync_fetch_and_xor (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f11 (int *a)
> +{
> +  unsigned int mask = (1u << 13);
> +  return (__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f12 (int *a)
> +{
> +  unsigned int mask = (1u << 0);
> +  return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f13 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  return (__sync_fetch_and_and (a, ~mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f14 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f15 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f16 (int *a)
> +{
> +  unsigned int mask = (1u << 7);
> +  return (__sync_fetch_and_and (a, ~mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f17 (int *a)
> +{
> +  unsigned int mask = (1u << 13);
> +  return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f18 (int *a)
> +{
> +  unsigned int mask = (1u << 0);
> +  return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f19 (unsigned long int *a, int bit)
> +{
> +  unsigned long int mask = (1ul << bit);
> +  return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f20 (unsigned long int *a)
> +{
> +  unsigned long int mask = (1ul << 7);
> +  return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) == 0;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f21 (int *a, int bit)
> +{
> +  unsigned int mask = (1u << bit);
> +  return (__sync_fetch_and_or (a, mask) & mask);
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f22 (unsigned long int *a)
> +{
> +  unsigned long int mask = (1ul << 7);
> +  return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask);
> +}
> +
> +__attribute__((noinline, noclone)) unsigned long int
> +f23 (unsigned long int *a)
> +{
> +  unsigned long int mask = (1ul << 7);
> +  return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask);
> +}
> +
> +__attribute__((noinline, noclone)) unsigned short int
> +f24 (unsigned short int *a)
> +{
> +  unsigned short int mask = (1u << 7);
> +  return (__sync_fetch_and_or (a, mask) & mask) != 0;
> +}
> +
> +__attribute__((noinline, noclone)) unsigned short int
> +f25 (unsigned short int *a)
> +{
> +  unsigned short int mask = (1u << 7);
> +  return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0;
> +}
> +
> +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 9 } } */
> +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 10 } } */
> +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 6 } } */
> --- gcc/testsuite/gcc.target/i386/pr49244-2.c.jj        2016-05-02 12:51:51.501983254 +0200
> +++ gcc/testsuite/gcc.target/i386/pr49244-2.c   2016-05-02 14:47:30.240202019 +0200
> @@ -0,0 +1,109 @@
> +/* PR target/49244 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -g" } */

Probably no need for -g in the testcase?

> +/* { dg-additional-options "-march=i486" { target ia32 } } */

Hm, a "dg-do run"  testcase with -march is kind of dangerous without a
runtime test for requested architecture. OTOH, do we really need
-march here?

Uros.



More information about the Gcc-patches mailing list