s390: Avoid CAS boolean output inefficiency
Richard Henderson
rth@redhat.com
Mon Aug 6 22:40:00 GMT 2012
On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> There is one particular inefficiency I have noticed. This code:
>
> if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
> abort ();
>
> from atomic-compare-exchange-3.c gets compiled into:
>
> l %r3,0(%r2)
> larl %r1,v
> cs %r3,%r4,0(%r1)
> ipm %r1
> sra %r1,28
> st %r3,0(%r2)
> ltr %r1,%r1
> jne .L3
>
> which is extremely inefficient; it converts the condition code into
> an integer using the slow ipm, sra sequence, just so that it can
> convert the integer back into a condition code via ltr and branch
> on it ...
This was caused (or perhaps abetted by) the representation of EQ
as NE ^ 1. With the subsequent truncation and zero-extend, I
think combine reached its insn limit of 3 before seeing everything
it needed to see.
I'm able to fix this problem by representing EQ as EQ before reload.
For extimm targets this results in identical code; for older targets
it requires avoidance of the constant pool, i.e. LHI+XR instead of X.
l %r2,0(%r3)
larl %r1,v
cs %r2,%r5,0(%r1)
st %r2,0(%r3)
jne .L3
That fixed, we see the second CAS in that file:
.loc 1 27 0
cs %r2,%r2,0(%r1)
ipm %r5
sll %r5,28
lhi %r0,1
xr %r5,%r0
st %r2,0(%r3)
ltr %r5,%r5
je .L20
This happens because CSE notices the cbranch vs 0, and sets r116
to zero along the path to
32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
__ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
at which point CSE decides that it would be cheaper to "re-use"
the zero already in r116 instead of load another constant 0 here.
After that, combine is ham-strung because r116 is not dead.
I'm not quite sure the best way to fix this, since rtx_costs already
has all constants cost 0. CSE ought not believe that r116 is better
than a plain constant. CSE also shouldn't be extending the life of
pseudos this way.
A short-term possibility is to have the CAS insns accept general_operand,
so that the 0 gets merged. With reload inheritance and post-reload cse,
that might produce code that is "good enough". Certainly it's effective
for the atomic-compare-exchange-3.c testcase. I'm less than happy with
that, since the non-optimization of CAS depends on following code that
is totally unrelated.
This patch ought to be independent of any other patch so far.
r~
-------------- next part --------------
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 0e43e51..bed6b79 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -5325,12 +5325,15 @@
(match_operand 3 "const0_operand")]))
(clobber (reg:CC CC_REGNUM))])]
""
- "emit_insn (gen_sne (operands[0], operands[2]));
- if (GET_CODE (operands[1]) == EQ)
- emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
- DONE;")
+{
+ if (!TARGET_EXTIMM && GET_CODE (operands[1]) == EQ)
+ {
+ emit_insn (gen_seq_neimm (operands[0], operands[2]));
+ DONE;
+ }
+})
-(define_insn_and_split "sne"
+(define_insn_and_split "*sne"
[(set (match_operand:SI 0 "register_operand" "=d")
(ne:SI (match_operand:CCZ1 1 "register_operand" "0")
(const_int 0)))
@@ -5342,6 +5345,48 @@
[(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28)))
(clobber (reg:CC CC_REGNUM))])])
+(define_insn_and_split "*seq"
+ [(set (match_operand:SI 0 "register_operand" "=d")
+ (eq:SI (match_operand:CCZ1 1 "register_operand" "0")
+ (const_int 0)))
+ (clobber (reg:CC CC_REGNUM))]
+ "TARGET_EXTIMM"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+{
+ rtx op0 = operands[0];
+ emit_insn (gen_lshrsi3 (op0, op0, GEN_INT (28)));
+ emit_insn (gen_xorsi3 (op0, op0, const1_rtx));
+ DONE;
+})
+
+;; ??? Ideally we'd USE a const1_rtx, properly reloaded, but that makes
+;; things more difficult for combine (which can only insert clobbers).
+;; But perhaps it would be better still to have simply used a branch around
+;; constant load instead of beginning with the IPM?
+;;
+;; What about LOCR for Z196? That's a more general question about cstore
+;; being decomposed into movcc...
+
+(define_insn_and_split "seq_neimm"
+ [(set (match_operand:SI 0 "register_operand" "=d")
+ (eq:SI (match_operand:CCZ1 1 "register_operand" "0")
+ (const_int 0)))
+ (clobber (match_scratch:SI 2 "=&d"))
+ (clobber (reg:CC CC_REGNUM))]
+ "!TARGET_EXTIMM"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+{
+ rtx op0 = operands[0];
+ rtx op2 = operands[2];
+ emit_insn (gen_ashlsi3 (op0, op0, GEN_INT (28)));
+ emit_move_insn (op2, const1_rtx);
+ emit_insn (gen_xorsi3 (op0, op0, op2));
+ DONE;
+})
;;
;; - Conditional move instructions (introduced with z196)
More information about the Gcc-patches
mailing list