This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 08/06/2012 11:34 AM, Ulrich Weigand wrote: > There is one particular inefficiency I have noticed. This code: > > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) > abort (); > > from atomic-compare-exchange-3.c gets compiled into: > > l %r3,0(%r2) > larl %r1,v > cs %r3,%r4,0(%r1) > ipm %r1 > sra %r1,28 > st %r3,0(%r2) > ltr %r1,%r1 > jne .L3 > > which is extremely inefficient; it converts the condition code into > an integer using the slow ipm, sra sequence, just so that it can > convert the integer back into a condition code via ltr and branch > on it ... This was caused (or perhaps abetted by) the representation of EQ as NE ^ 1. With the subsequent truncation and zero-extend, I think combine reached its insn limit of 3 before seeing everything it needed to see. I'm able to fix this problem by representing EQ as EQ before reload. For extimm targets this results in identical code; for older targets it requires avoidance of the constant pool, i.e. LHI+XR instead of X. l %r2,0(%r3) larl %r1,v cs %r2,%r5,0(%r1) st %r2,0(%r3) jne .L3 That fixed, we see the second CAS in that file: .loc 1 27 0 cs %r2,%r2,0(%r1) ipm %r5 sll %r5,28 lhi %r0,1 xr %r5,%r0 st %r2,0(%r3) ltr %r5,%r5 je .L20 This happens because CSE notices the cbranch vs 0, and sets r116 to zero along the path to 32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE)) at which point CSE decides that it would be cheaper to "re-use" the zero already in r116 instead of load another constant 0 here. After that, combine is ham-strung because r116 is not dead. I'm not quite sure the best way to fix this, since rtx_costs already has all constants cost 0. CSE ought not believe that r116 is better than a plain constant. CSE also shouldn't be extending the life of pseudos this way. A short-term possibility is to have the CAS insns accept general_operand, so that the 0 gets merged. With reload inheritance and post-reload cse, that might produce code that is "good enough". Certainly it's effective for the atomic-compare-exchange-3.c testcase. I'm less than happy with that, since the non-optimization of CAS depends on following code that is totally unrelated. This patch ought to be independent of any other patch so far. r~
Attachment:
z
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |