GCC Atomic operations and unaligned memory

This applies to GCC 4.9 and beyond.

Within GCC, there are 3 ways to implement atomic operations.

If an operation cannot be provided by GCC in a lock free way (the first 2 options), it should be left as external calls into the third option, the libatomic library. GCC is not intended to ever directly provide locks for built-in __atomic functions.

A libatomic port/customization for the target should provide all __atomic operations. libatomic was designed to provide an upgrade path such that new hardware can provide new lock-free functionality with only a new library, not requiring older compiler objects to be recompiled.

As of GCC 4.8, libatomic is provided as part of the gcc source tree.

see Gcc's Atomic library.

Size & Alignment

It is often the case that the lock free operations provided by GCC have alignment requirements. GCC 4.9 will provide an atomic type attribute which can be set on objects. This attribute will force a specific alignment and size on the object which may be different than the original data type. The alignment and size will attempt to provide lock free operations, if they exist.

The C11 _Atomic type qualifier and C++11 atomic<type> template will use this attribute to ensure atomic objects of a specified type operate consistently.

  typedef char B3[3];
  _Atomic B3 obj2;

An object will be promoted up to the next lock-free size in order to enable lock free operations, as long as it isn't already a documented lock free size. So obj2 will be promoted to a 4 byte object with whatever alignment is required in order to enable lock-free operations.

  _Atomic short obj1;

Default lock free requirements are:

No object larger than 16 bytes will be upsized/aligned.

Some targets may have more stringent requirements (ie may not support 1 or 2 byte operations, or require 4 byte objects to be 8 byte aligned). These targets will be able to override these default requirements for the atomic attribute to enable the port to get the required alignment/size settings.

There will also be a build option available to turn off atomic attribute upsizing/realignment. If space is more important than lock free operations, or some form of compatibility with previous/other compilers is required, --disable-atomic-resize at configuration time can be specified, and the platform will then not perform this padding. obj2 would remain as a 3 byte object, and if that doesn't allow lock free operations, libatomic will provide a locked implementation. Note that using this option would break ABI compatibility with objects compiled on a platform which does allow resizing. That is why it is a build time configuration option.

More details on alignment policy re: Lawrence Crowl:

If the type has a size that is larger than natively atomic, then the implementation is locking. The implementation appears only in libatomic.

If the type has a size that is smaller than natively atomic, then the implementation can be one of four types: super-aligned high, super-aligned low, containing-word, and locking.

For clarity, I have written functions to implement atomic_fetch_add for 16-bit short atomics on 32-bit native long atomics using the first three techniques.

short super_aligned_high(atomic_short* hp, short val) {
  // active bits are stored in the high part
  // stores must up-shift; loads must down-shift
  atomic_long* lp = hp;
  return lp->fetch_add(val << 16) >> 16;
}

short super_aligned_low(atomic_short* hp, short val) {
  // active bits are stored in the low part
  // loads and stores are direct
  atomic_long* lp = hp;
  long desired;
  long expected = lp->load();
  do {
    desired = (expected + val) & 0xFFFF;
  } while ( !lp->compare_exchange_weak(expected, desired) );
  return expected;
}

short containing_word(atomic_short* hp, short val) {
  // low order part of long at low address (i.e. little endian)
  // loads must test for high and possibly shift
  // stores must do a compare-exchange loop
  if ( hp & 2 ) { // misaligned; atomic short in upper half
     atomic_long *lp = hp & ~2;
     long desired;
     long expected = lp->load();
     do {
       desired = expected + (val << 16);
     } while ( !lp->compare_exchange_weak(expected, desired) );
     return expected >> 16;
  }
  else { // aligned; atomic short in lower half
     atomic_long *lp = hp;
     long desired;
     long expected = lp->load();
     do {
       upper = expected & 0xFFFF0000;
       lower = (expected + val) & 0xFFFF;
       desired = upper | lower;
     } while ( !lp->compare_exchange_weak(expected, desired) );
     return expected;
  }
}

The super-aligned implementations did not have as much an advantage as I had thought. Anyway, there are some very definite tradeoffs here, and I think the platform owner needs to make the choice.

In terms of implementation, the locking version would of course be in libatomic. I think it is reasonable to put the other implementations in libatomic in the first go-round. Later, the compiler could inline some of these functions at the point of call. In either case, the compiler will need to know how to fill the super-aligned words with constants for static initialization.

Behaviour of the __atomic operations

GCC currently uses can_compare_and_swap_p() to determine whether a lock free instruction sequence can be used. A new routine :

  is_lock_free_available_p (enum machine_mode mode)

will provide the existing functionality of can_compare_and_swap_p(). GCC will also eventually provide another compiler option -fno-atomic-cas-loop which will disable generation of compare and swap loops to implement missing atomic operations. is_lock_free_available_p() will then check for the existence of lock free versions of each of the atomic operations rather than checking for the existence of a compare_and_swap operation.

-fno-atomic-cas-loop is provided for users who are interested in experimenting or implementing atomic operations where progress guarantees are desired. A normal compare-and-swap loop make no guarantees, and in theory it is possible that it will never terminate. In practice this is unlikely, but there is interest in operations which do make various sorts of progress guarantees and performs various types of back-off. Using this option would enable use of GCC with a libatomic implementation that provided these kinds of guarantees. As long as the implementation maintained the same lock-free characteristics (ie, didn't add locks where none were before), ABI compatibility would be maintained as well.

__atomic_always_lock_free(size_t size, obj *ptr)

when ptr is NULL, this is used to check whether an object with the 'proper' alignment of an atomic type of size bytes is always lock free or not.

When ptr is non-NULL, the alignment of the object type pointed to is compared with the appropriately sized atomic type.

__atomic_is_lock_free(size_t size, obj *ptr)

This routine functions exactly as __atomic_always_lock_free, except rather than returning a value of 'false', a call into the libatomic library routine __atomic_is_Lock_free() is made. libatomic::__atomic_is_lock_free() will then be resolved at runtime.

This allows objects which are not properly sized or aligned to be examined by the library implementation to determine whether an atomic operation on the object would be lock free or not. Note that the library would also provide the functionality of any atomic operations on this object, so it is equipped to answer the question.

This resolves issues with the C++11 and C11 atomic data types and macros since *all* atomic data in those standard driven environments will have the atomic type attribute set, and thus will all be aligned the same way. The compiler __atomic_{is,always}_lock_free() routines will the cause the compiler to always generate a lock free sequence, or will always generate a call into libatomic.

This also allows the expert user who utilizes the __atomic built-in routines to have the flexibility of per-object lock-free properties as needed. These user may also apply the atomic type attribute to their data if they require a consistent alignment, as the C11 and C++11 standards do.

Note that using lock-free algorithms on data which does not have the atomic attribute can be hazardous since it is possible that not all objects of the same type will be lock free.

The availability of per-object lock-free is for legacy code that doesn't matter or the user who knows what they are doing.

GCC 4.7, GCC 4.8

GCC 4.7 introduced atomic support, but did not address alignment or sizing issues. It also did not provide an integrated libatomic, although a source level implementation was available to compile and link in with applications to provide required functionality. There may be some inconsistencies with __atomic_is_lock_free and __atomic_always_lock_free on some ports.

C++11 support was experimental in GCC 4.7 & 4.8, and with the addition of the atomic type attribute, atomic<> objects in GCC 4.7 & 4.8 may not be binary compatible with GCC 4.9 and beyond. There may also be bugs with the *_LOCK_FREE macros and is_lock_free() methods in some instances due to alignment issues.

It is unknown at this point if any of the GCC 4.9 functionality to resolve these issues will be backported to a future 4.7 or 4.8 release. My guess is that it is unlikely since it would break compatibility with previous versions of GCC 4.[78].x.

Atomic type attribute

A target needs to determine a list of machine modes which the builtin atomic functions will use. The compiler will set up the basic atomic type nodes and give them a default:

   atomicQI_type-node    ->   QImode
   atomicHI_type-node    ->   HImode
   atomicSI_type-node    ->   SImode
   atomicDI_type-node    ->   DImode
   atomicTI_type-node    ->   TImode

A backend will be able to override this value during initialization if the defaultnb is not sufficient. All the atomic builtins and operations will then default to using this type.

So if a target has some requirement where all atomic operations required a minimum of 4 bytes that were 8 byte aligned, and wished to apply this to everything 4 bytes and smaller, it could declare a custom mode such as 'AT_SImode' which has that property and then override the defaults with:

   atomicQI_type-node    ->   AT_SImode
   atomicHI_type-node    ->   AT_SImode
   atomicSI_type-node    ->   AT_SImode
   atomicDI_type-node    ->   DImode
   atomicTI_type-node    ->   TImode

This is accomplished by using a target hook for the override which changes the defaults, and may look something along the lines of:

  static tree 
  atomic_type_override (enum machine_mode mode)
  {
    tree t;
    
    if (mode == QImode || mode == HImode || mode == SImode)
      {
        t = build_atomic_variant (AT_SI_type_node);
        return t;
      }
    return NULL;
  }

<...>

   targetm.atomic_type_for_mode = atomic_type_override;

exact details may change by the time 4.9 gets here, but that is the basic jist.