This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> There is one particular inefficiency I have noticed. This code:
>
> if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))
> abort ();
>
> from atomic-compare-exchange-3.c gets compiled into:
>
> l %r3,0(%r2)
> larl %r1,v
> cs %r3,%r4,0(%r1)
> ipm %r1
> sra %r1,28
> st %r3,0(%r2)
> ltr %r1,%r1
> jne .L3
>
> which is extremely inefficient; it converts the condition code into
> an integer using the slow ipm, sra sequence, just so that it can
> convert the integer back into a condition code via ltr and branch
> on it ...
This was caused (or perhaps abetted by) the representation of EQ
as NE ^ 1. With the subsequent truncation and zero-extend, I
think combine reached its insn limit of 3 before seeing everything
it needed to see.
I'm able to fix this problem by representing EQ as EQ before reload.
For extimm targets this results in identical code; for older targets
it requires avoidance of the constant pool, i.e. LHI+XR instead of X.
l %r2,0(%r3)
larl %r1,v
cs %r2,%r5,0(%r1)
st %r2,0(%r3)
jne .L3
That fixed, we see the second CAS in that file:
.loc 1 27 0
cs %r2,%r2,0(%r1)
ipm %r5
sll %r5,28
lhi %r0,1
xr %r5,%r0
st %r2,0(%r3)
ltr %r5,%r5
je .L20
This happens because CSE notices the cbranch vs 0, and sets r116
to zero along the path to
32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
__ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
at which point CSE decides that it would be cheaper to "re-use"
the zero already in r116 instead of load another constant 0 here.
After that, combine is ham-strung because r116 is not dead.
I'm not quite sure the best way to fix this, since rtx_costs already
has all constants cost 0. CSE ought not believe that r116 is better
than a plain constant. CSE also shouldn't be extending the life of
pseudos this way.
A short-term possibility is to have the CAS insns accept general_operand,
so that the 0 gets merged. With reload inheritance and post-reload cse,
that might produce code that is "good enough". Certainly it's effective
for the atomic-compare-exchange-3.c testcase. I'm less than happy with
that, since the non-optimization of CAS depends on following code that
is totally unrelated.
This patch ought to be independent of any other patch so far.
r~
Attachment:
z
Description: Text document
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |