[Bug target/93177] PPC: Missing many useful platform intrinsics

memmerto at ca dot ibm.com gcc-bugzilla@gcc.gnu.org
Mon Jan 13 15:18:00 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93177

--- Comment #9 from Matt Emmerton <memmerto at ca dot ibm.com> ---
(In reply to Segher Boessenkool from comment #6)
> (In reply to Matt Emmerton from comment #4)
> > The intrinsics that we would find useful, having used them as provided by
> > the IBM XL C/C++ compiler, are the following:
> > 
> > __sync()
> > __isync()
> > __lwsync()
> 
> The sync intrinsics need to be tied to some other code.  A volatile asm with
> a "memory" clobber is not good enough, in many cases.

We use these in our internal mutex and atomic implementations, and the
resulting sequences are carefully scrutinized.

> > __lwarx()
> > __ldarx()
> > __stwcx()
> > __stdcx()
> 
> The compiler can always insert memory accesses in between those two, if you
> have them as separate intrinsics (and it will, simply stack accesses for
> temporaries will do, already).  If those accesses hit the same reservation
> granule as the larx/stcx. uses, you lose.
> 
> You need to write the whole sequence in one piece of assembler code.

I would argue that the compiler should be smart enough to realize that these
are part of a decomposed atomic operation, and avoid arbitrary instruction
injection.

As per my previous update, we use these primitives to implement things that the
bulitin __atomic_* functions do not implement.

> > __protected_stream_set()
> > __protected_stream_count()
> > __protected_stream_count_depth() // currently not implemented in gcc
> > __protected_stream_go()
> 
> Those are pretty specific to CBE I think?

No.  They are implemented on POWER5 and above (ISA 2.02), and are useful in
managing cache prefetch behaviour.

> > The implementation of stwcx() and stdcx() need revision on PPC.
> > As I understand it, there is no need the mfocrf instruction nor the
> > mask-and-shift on result.
> 
> How else would you output the CR0.EQ bit?

There is no need to copy CR0 to a GPR - branch instructions such as BNE can
operate on CR0 directly.


More information about the Gcc-bugs mailing list