[PATCH 2/2] s390: Convert from sync to atomic optabs

Ulrich Weigand uweigand@de.ibm.com
Mon Aug 6 18:34:00 GMT 2012


Richard Henderson wrote:


Some more comments on this patch.

> +; Different from movdi_31 in that we have no splitters.
> +(define_insn "atomic_loaddi_1"
> +  [(set (match_operand:DI 0 "register_operand" "=d,d,!*f,!*f")
> +	(unspec:DI [(match_operand:DI 1 "s_operand" "Q,S,Q,m")]

The constraint line doesn't look right.  ld(y) accept base+index,
so they should get R and T constraints, respectively.

In fact, I wouldn't use "s_operand" as a predicate, since this
excludes valid addresses for those cases, and since in general
I've found it to be preferable to have reload fix up addresses.
So I'd just use "memory_operand" here, and actually in all the
other patterns as well ...  (This also simplifes the expander.)

> +		   UNSPEC_MOVA))]
> +  "!TARGET_ZARCH"
> +  "@
> +   lm\t%0,%M0,%S1
> +   lmy\t%0,%M0,%S1
> +   ld\t%0,%1
> +   ldy\t%0,%1"
> +  [(set_attr "op_type" "RS,RSY,RS,RSY")
> +   (set_attr "type" "lm,lm,floaddf,floaddf")])
> +
> +(define_insn "atomic_loadti_1"
> +  [(set (match_operand:TI 0 "register_operand" "=r")
> +	(unspec:TI [(match_operand:TI 1 "memory_operand" "m")]

"m" is wrong here (and basically everywhere), since it includes
SYMBOL_REFs for lrl etc. on z10.  For lpq we need "RT".

> +  "lpq\t%0,%1"
> +  [(set_attr "op_type" "RXY")])

LPQ and STPQ probably should get a "type" of "other".  This may not
model the pipeline exactly, but it's better than just assuming
"integer" by default.  These instructions are special (and slow).

> +(define_expand "atomic_compare_and_swap<mode>"
> +  [(match_operand:SI 0 "register_operand")	;; bool success output
> +   (match_operand:DGPR 1 "register_operand")	;; oldval output
> +   (match_operand:DGPR 2 "s_operand")		;; memory

memory_operand probably better again.

> +   (match_operand:DGPR 3 "register_operand")	;; expected intput
> +   (match_operand:DGPR 4 "register_operand")	;; newval intput
> +   (match_operand:SI 5 "const_int_operand")	;; is_weak
> +   (match_operand:SI 6 "const_int_operand")	;; success model
> +   (match_operand:SI 7 "const_int_operand")]	;; failure model
> +  ""
> +{
> +  rtx cc, cmp;
> +  emit_insn (gen_atomic_compare_and_swap<mode>_internal
> +	     (operands[1], operands[2], operands[3], operands[4]));
> +  cc = gen_rtx_REG (CCZ1mode, CC_REGNUM);
> +  cmp = gen_rtx_NE (SImode, cc, const0_rtx);

As already pointed out, this needs to be EQ.

> -  ""
> -  "cds<tg>\t%0,%3,%S1"
> -  [(set_attr "op_type" "RS<TE>")
> +  "TARGET_ZARCH"
> +  "c<td>sg\t%0,%3,%S1"
> +  [(set_attr "op_type" "RXY")

Should be "RSY"

> +  "@
> +   cds\t%0,%3,%S1
> +   cdsy\t%0,%3,%S1"
> +  [(set_attr "op_type" "RX,RXY")

Should be "RS, RSY"

> +  "@
> +   cs\t%0,%3,%S1
> +   csy\t%0,%3,%S1"
> +  [(set_attr "op_type" "RX,RXY")

Likewise.

> +(define_insn "atomic_fetch_<atomic><mode>_iaf"
> +  [(set (match_operand:GPR 0 "register_operand" "=d")
> +	(match_operand:GPR 1 "s_operand" "+S"))
> +   (set (match_dup 1)
> +	(unspec_volatile:GPR
> +	 [(ATOMIC_Z196:GPR (match_dup 1)
> +			   (match_operand:GPR 2 "general_operand" "d"))]
> +	 UNSPECV_ATOMIC_OP))
> +   (clobber (reg:CC CC_REGNUM))]
>    "TARGET_Z196"
> -  "la<noxa><g>\t%0,%2,%1")
> +  "la<noxa><g>\t%0,%2,%1"
> +  [(set_attr "op_type" "RXY")

Again RSY.



There is one particular inefficiency I have noticed.  This code:

  if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
    abort ();

from atomic-compare-exchange-3.c gets compiled into:

        l       %r3,0(%r2)
        larl    %r1,v
        cs      %r3,%r4,0(%r1)
        ipm     %r1
        sra     %r1,28
        st      %r3,0(%r2)
        ltr     %r1,%r1
        jne     .L3

which is extremely inefficient; it converts the condition code into
an integer using the slow ipm, sra sequence, just so that it can
convert the integer back into a condition code via ltr and branch
on it ...

Now, with the old code this sequence used to be optimized away by
combine and we'd just have the cs followed by a branch.  This is
not done any more because we have the store to "expected" in between.

This is because of this code in combine:

      /* Make sure that the value that is to be substituted for the register
         does not use any registers whose values alter in between.  However,
         If the insns are adjacent, a use can't cross a set even though we
         think it might (this can happen for a sequence of insns each setting
         the same destination; last_set of that register might point to
         a NOTE).  If INSN has a REG_EQUIV note, the register is always
         equivalent to the memory so the substitution is valid even if there
         are intervening stores.  Also, don't move a volatile asm or
         UNSPEC_VOLATILE across any other insns.  */
      || (! all_adjacent
          && (((!MEM_P (src)
                || ! find_reg_note (insn, REG_EQUIV, src))
               && use_crosses_set_p (src, DF_INSN_LUID (insn)))
              || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
              || GET_CODE (src) == UNSPEC_VOLATILE))

Note that we have exactly the case mentioned, where a series of instructions
to be combined all set the same (CC) register.  If they're all adjacent,
this still gets optimized away -- but they no longer are due to the store.

Is there a way of structuring the atomic optabs/expander so that the store
gets done either before or after all the CC operations?


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com



More information about the Gcc-patches mailing list