This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
This would be __atomic_fetch_add, __atomic_fetch_sub, and __atomic_compare_exchange.
So we have several atomics we use in the kernel, with the more common being
- add (and subtract) and cmpchg of both 'int' and 'long'
- add_return (add and return new value)__atomic_add_fetch returns the new value. (__atomic_fetch_add returns the old value). If it isn't as efficient as it needs to be, the RTL pattern can be fixed. what sequence do you currently use for this? The compiler currently generates the equivilent of
lock; xadd add
Since these are older x86 only, could you use add_return() always and then have the compiler use new peephole optimizations to detect those usage patterns and change the instruction sequence for x86 when required? would that be acceptable? Or maybe you don't trust the compiler :-) Or maybe I can innocently ask if the performance impact on older x86 matters enough any more? :-)- special cases of the above: dec_and_test (decrement and test result for zero) inc_and_test (decrement and test result for zero) add_negative (add and check if result is negative)
The special cases are because older x86 cannot do the generic "add_return" efficiently - it needs xadd - but can do atomic versions that test the end result and give zero or sign information.
- atomic_add_unless() - basically an optimized cmpxchg.
Are these functions wrappers to a tight load, mask, cmpxchg loop? or something else? These could also require new built-ins if they can't be constructed from the existing operations...- atomic bit array operations (bit set, clear, set-and-test, clear-and-test). We do them on "unsigned long" exclusively, and in fact we do them on arrays of unsigned long, ie we have the whole "bts reg,mem" semantics. I'm not sure we really care about the atomic versions for the arrays, so it's possible we only really care about a single long.
The only complication with the bit setting is that we have a concept of "set/clear bit with memory barrier before or after the bit" (for locking). We don't do the whole release/acquire thing, though.
The existing __atomic builtins will work on 1,2,4,8 or 16 byte values regardless of type, as long as the hardware supports those sizes. so x86-64 can do a 16 byte cmpxchg.
- compare_xchg_double
We also do byte/word atomic increments and decrements, but that' sin the x86 spinlock implementation, so it's not a generic need.
It may be possible to add modifier extensions to the memory model component for such a thing. ieWe also do the add version in particular as CPU-local optimizations that do not need to be SMP-safe, but do need to be interrupt-safe. On x86, this is just an r-m-w op, on most other architectures it ends up being the usual load-locked/store-conditional.
I think that's pretty much it, but maybe I'm missing something.
Of course, locking itself tends to be special cases of the above with extra memory barriers, but it's usually hidden in asm for other reasons (the bit-op + barrier being a special case).
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |