This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

s390: Avoid CAS boolean output inefficiency


On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> There is one particular inefficiency I have noticed.  This code:
> 
>   if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
>     abort ();
> 
> from atomic-compare-exchange-3.c gets compiled into:
> 
>         l       %r3,0(%r2)
>         larl    %r1,v
>         cs      %r3,%r4,0(%r1)
>         ipm     %r1
>         sra     %r1,28
>         st      %r3,0(%r2)
>         ltr     %r1,%r1
>         jne     .L3
> 
> which is extremely inefficient; it converts the condition code into
> an integer using the slow ipm, sra sequence, just so that it can
> convert the integer back into a condition code via ltr and branch
> on it ...

This was caused (or perhaps abetted by) the representation of EQ
as NE ^ 1.  With the subsequent truncation and zero-extend, I
think combine reached its insn limit of 3 before seeing everything
it needed to see.

I'm able to fix this problem by representing EQ as EQ before reload.
For extimm targets this results in identical code; for older targets
it requires avoidance of the constant pool, i.e. LHI+XR instead of X.

        l       %r2,0(%r3)
        larl    %r1,v
        cs      %r2,%r5,0(%r1)
        st      %r2,0(%r3)
        jne     .L3

That fixed, we see the second CAS in that file:

        .loc 1 27 0
        cs      %r2,%r2,0(%r1)
        ipm     %r5
        sll     %r5,28
        lhi     %r0,1
        xr      %r5,%r0
        st      %r2,0(%r3)
        ltr     %r5,%r5
        je      .L20

This happens because CSE notices the cbranch vs 0, and sets r116
to zero along the path to

     32   if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
                                            __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))

at which point CSE decides that it would be cheaper to "re-use"
the zero already in r116 instead of load another constant 0 here.
After that, combine is ham-strung because r116 is not dead.

I'm not quite sure the best way to fix this, since rtx_costs already
has all constants cost 0.  CSE ought not believe that r116 is better
than a plain constant.  CSE also shouldn't be extending the life of
pseudos this way.

A short-term possibility is to have the CAS insns accept general_operand,
so that the 0 gets merged.  With reload inheritance and post-reload cse,
that might produce code that is "good enough".  Certainly it's effective
for the atomic-compare-exchange-3.c testcase.  I'm less than happy with
that, since the non-optimization of CAS depends on following code that
is totally unrelated.

This patch ought to be independent of any other patch so far.


r~

Attachment: z
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]