Current __exchange_and_add on ia64
Paolo Carlini
pcarlini@suse.de
Fri Nov 18 14:36:00 GMT 2005
Alexander Terekhov wrote:
>>4_0-branch
>>---
>>0000000000000000 <__exchange_and_add>:
>> 0: 19 00 00 00 22 00 [MMB] mf
>> 6: 80 00 80 60 21 00 ld4.acq r8=[r32]
>>
>>
>No need for .acq here (in addition to preceding mf).
>
>
I see...
>> c: 00 00 00 20 nop.b 0x0;;
>> 10: 09 70 20 00 08 20 [MMI] addp4 r14=r8,r0
>> 16: f0 00 20 00 42 00 mov r15=r8
>> 1c: 81 08 01 80 add r8=r8,r33;;
>> 20: 0b 00 38 40 2a 04 [MMI] mov.m ar.ccv=r14;;
>> 26: 80 40 80 22 20 00 cmpxchg4.acq r8=[r32],r8,ar.ccv
>> 2c: 00 00 04 00 nop.i 0x0;;
>> 30: 10 00 00 00 01 00 [MIB] nop.m 0x0
>> 36: 70 78 20 0c 71 03 cmp4.eq p7,p6=r15,r8
>> 3c: e0 ff ff 4a (p06) br.cond.dptk.few 10
>><__exchange_and_add+0x10>
>> 40: 17 00 00 00 00 08 [BBB] nop.b 0x0
>> 46: 00 00 00 00 10 80 nop.b 0x0
>> 4c: 08 00 84 00 br.ret.sptk.many b0;;
>>
>>
>Brr. I suppose it does
>
>fence();
>old = load_acq(__mem);
>while ((result = cas_acq(__mem, old + __val, old)) != old) old = result;
>
>Right?
>
>
Yes, I think so ;) In any case, it seems to me a pretty straightforward
way to implement the required atomic operation in terms of cas.
>>mainline
>>---
>>0000000000000000 <__exchange_and_add>:
>> 0: 09 78 00 40 b0 10 [MMI] ld4.acq r15=[r32]
>> 6: 00 00 00 02 00 00 nop.m 0x0
>> c: 00 00 04 00 nop.i 0x0;;
>> 10: 09 00 3c 40 2a 04 [MMI] mov.m ar.ccv=r15
>> 16: e0 00 3c 00 42 e0 mov r14=r15
>> 1c: f1 08 01 80 add r15=r15,r33;;
>> 20: 09 40 00 40 22 04 [MMI] mov.m r8=ar.ccv
>> 26: f0 78 80 62 20 00 cmpxchg4.rel
>>r15=[r32],r15,ar.ccv
>> 2c: 00 00 04 00 nop.i 0x0;;
>> 30: 13 30 38 1e 07 b8 [MBB] cmp.eq p6,p7=r14,r15
>> 36: 01 f0 ff ff 25 80 (p06) br.cond.dpnt.few 10
>><__exchange_and_add+0x10>
>> 3c: 08 00 84 00 br.ret.sptk.many b0;;
>>
>>
>
>old = load_acq(__mem);
>while ((result = cas_rel(__mem, old + __val, old)) != old) old = result;
>
>I suppose.
>
>
Ok...
>>You see, mainline doesn't emit any 'mf'. Another difference is that
>>mainline uses 'cmpxchg4.rel' instead of 'cmpxchg4.acq'. Now, if I
>>remember correctly an old message from Alexander, either 'mf' is emitted
>>before 'cmpxchg4.acq' or after 'cmpxchg4.rel', but must be present...
>>
>>
>Your mainline doesn't seem to provide fully-fenced semantics (in spite
>of ld.acq preceding cas loop with subsequent cas.rel on the same ia64
>"semaphore" inside it). Subsequent (unordered) loads can be hoisted
>above cas.rel (initial acquire on load preceding cas loop doesn't help
>at all with respect to lack of store-load fencing, to begin with)...
>and that can break things. Not good.
>
>
Argh!! Thanks for the analysis. I'm going to attach this info to the
audit trail of the PR (target/24757). Now all those regressions in the
threaded tests can be easily explained...
Paolo.
More information about the Libstdc++
mailing list