This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Speed up __sync_lock_test_and_set on PowerPC
- From: Anton Blanchard <anton at samba dot org>
- To: gcc at gcc dot gnu dot org
- Date: Wed, 3 Sep 2008 17:06:14 +1000
- Subject: [PATCH] Speed up __sync_lock_test_and_set on PowerPC
Hi,
I was debugging some performance issues with an application that uses
the gcc builtin lock functions on powerpc. A simple test case:
long lock_try(long *value)
{
return __sync_lock_test_and_set(value, 1);
}
long unlock(long *value)
{
__sync_lock_release(value);
}
00000010 <lock_try>:
10: 7c 00 04 ac sync
14: 7c 69 1b 78 mr r9,r3
18: 38 00 00 01 li r0,1
1c: 7c 60 48 28 lwarx r3,0,r9
20: 7c 00 49 2d stwcx. r0,0,r9
24: 40 a2 ff f8 bne- 1c <lock_try+0xc>
28: 4c 00 01 2c isync
2c: 4e 80 00 20 blr
00000000 <unlock>:
0: 7c 20 04 ac lwsync
4: 38 00 00 00 li r0,0
8: 90 03 00 00 stw r0,0(r3)
c: 4e 80 00 20 blr
unlock looks good, but lock has both release and acquire barriers. Even
worse, the release barrier is a heavyweight sync which is very slow.
Looking at the gcc documentation, sync_lock_test_and_set only needs an
aquire barrier:
> sync_lock_test_and_set
...
> This pattern must issue any memory barrier instructions such that the
> pattern as a whole acts as an acquire barrier, that is all memory
> operations after the pattern do not occur until the lock is acquired.
In light of this, remove the release barrier from
rs6000_split_lock_test_and_set:
00000010 <lock_try>:
10: 7c 69 1b 78 mr r9,r3
14: 38 00 00 01 li r0,1
18: 7c 60 48 28 lwarx r3,0,r9
1c: 7c 00 49 2d stwcx. r0,0,r9
20: 40 a2 ff f8 bne- 18 <lock_try+0x8>
24: 4c 00 01 2c isync
28: 4e 80 00 20 blr
Anton
--
Index: gcc/gcc/config/rs6000/rs6000.c
===================================================================
--- gcc.orig/gcc/config/rs6000/rs6000.c 2008-09-03 02:30:14.000000000 -0400
+++ gcc/gcc/config/rs6000/rs6000.c 2008-09-03 02:33:35.000000000 -0400
@@ -14000,8 +14000,6 @@
enum machine_mode mode = GET_MODE (mem);
rtx label, x, cond = gen_rtx_REG (CCmode, CR0_REGNO);
- emit_insn (gen_memory_barrier ());
-
label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
emit_label (XEXP (label, 0));