This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work



On 9/25/2014 12:36 AM, Yury Gribov wrote:
On 09/24/2014 12:31 PM, Richard Biener wrote:
On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd <dw@limegreensocks.com> wrote:
Hans-Peter Nilsson: I should have listened to you back when you raised
concerns about this.  My apologies for ever doubting you.

In summary:

- The "trick" in the docs for using an arbitrarily sized struct to force
register flushes for inline asm does not work.
- Placing the inline asm in a separate routine can sometimes mask the
problem with the trick not working.
- The sample that has been in the docs forever performs an unhelpful,
unexpected, and probably unwanted stack allocation + memcpy.

Details:

Here is the text from the docs:

-----------
One trick to avoid [using the "memory" clobber] is available if the size of
the memory being accessed is known at compile time. For example, if
accessing ten bytes of a string, use a memory input like:

     "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )

Well - this can't work because you essentially are using a _value_
here (looking at the GIMPLE - I'm not sure if a statement expression
evaluates to an lvalue.

It should work if you simply do this without a stmt expression:

   "m" (*(struct { char x[10]; } *)ptr)

because that's clearly an lvalue (and the GIMPLE correctly says so):

   <bb 2>:
   c.a = 1;
   c.b = 2;
   __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
&c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
   printf ("%u %u\n", 1, 2);

note that we still constant propagated 1 and 2 for the reason that
the asm didn't get any VDEF.  That's because you do not have any
memory output!  So while it keeps 'c' live it doesn't consider it
modified by the asm.  You'd still need to clobber the memory,
but "m" clobbers are not supported, only "memory".

Thus fixed asm:


       __asm__ __volatile__ ("rep; stosb"
            : "=D" (Dest), "+c" (Count)
            : "0" (&c), "a" (0),
            "m" (*( struct foo { char x[8]; } *)&c)
            : "memory"
       );

where I'm not 100% sure if the "m" input is now pointless (that is,
if a "memory" clobber also constitutes a use of all memory).

Or maybe even
  __asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; } *)&c)
       : "0" (&c), "a" (0)
  );
to avoid the big-hammer memory clobber?

-Y


Thank you both for the responses. At this point I've started composing some replacement text for the docs (below), cuz clearly what's there is both inadequate and wrong. All comments are welcome.

While the code in Richard's response will always produce the correct results, the intent here is to use memory constraints to *avoid* the performance penalties of the memory clobber. The existing docs say this should work, and I've seen a number of places using it (linux kernel, glibc, etc). If this does work, we should update the docs to say how it's done. If this doesn't work, we should say that too.

Looking at Yury's response, that code does work (in c, not c++). At least it does sometimes. The problem is that sometimes gcc can "lose track" of what a pointer points to. And when it does, gcc can get confused about what to flush. Here's a simple example that shows this (4.9.0, x64, -O2, 'c'):

#include <stdio.h>

inline void *memset( void * Dest, int c, size_t Count) {
   void *dummy;

   __asm__ (
      "rep stosb"
: "=D" (dummy), "+c" (Count), "=m" (*( struct foo { char x[8]; } *)Dest)
      : "0" (Dest), "a" (c)
      : "cc"//, "memory"
   );

   return Dest;
}

int main() {
   struct {
     int a;
   } c;

   void *v;
   asm volatile("#" : "=r" (v) : "0" (&c)  );

   c.a = 0x30303030;

   //memset(&c, 0, sizeof(c));
   memset(v, 0, sizeof(c));
   printf("%x\n", c.a);
}

This code will work if you pass &c to memset. But it will fail if you use v. Oh, the wonders of aliasing.

And this is why I'm having a problem doc'ing this. I love the potential benefits of using this. But if you are writing general-purpose routines, how can you hope to know whether the code calling you will pass you a pointer that will work with this trick? This kind of thing can introduce *horribly* hard to track down problems. Note that the memory clobber always works, but potentially with a performance penalty. <sigh>

I could simply skip describing the whole problem with aliasing, but that just hides the problem. I hate when docs do that.

That said, here's what I've written so far. Any improvements or corrections are very welcome.

==========================
Flushing registers to memory has performance implications and may be an issue
for time-sensitive code. One trick to avoid flushing more registers than is
absolutely required is to use a memory constraint.

For example this i386 code works for both c and c++:

@example
inline void *memset(void *Dest, int c, size_t Count)
@{
   struct _reallybigstruct @{ char x[SSIZE_MAX]; @};
   void *dummy;

   asm (
      "rep stosb"
      : "=D" (dummy), "+c" (Count), "=m" (*(struct _reallybigstruct *)Dest)
      : "0" (Dest), "a" (c)
      : "cc"
   );

   return Dest;
@}
@end example

Similar code can be used if the memory is being read (by changing the memory
constraint to be an input) or updated (by changing the constraint to use
'+' instead of '=').

A few noteworthy things about clobbering using memory constraints:

@itemize @minus
@item
Note that @code{_reallybigstruct} uses a size of @code{SSIZE_MAX}. This does not mean that when casting @code{Dest} to @code{(struct _reallybigstruct *)},
@code{Dest} must be @code{SSIZE_MAX} bytes long or that @code{SSIZE_MAX}
bytes will be used. Nor does it affect registers associated with items that
may follow @code{Dest} in memory.  This only affects the symbol @code{Dest}.

@item
It may be that using memory constraints can remove the need for the asm
@code{volatile} qualifier.  This can, under certain circumstances, result in
more efficient code. Consider the memset sample above.  Without the memory
constraint, the only outputs are @code{dummy} and @code{Count}. However,
both those symbols go out of scope immediately after the asm statement. As a
result, gcc optimization could reasonably assume that the asm block isn't
actually needed for anything.  Even adding the "memory" clobber doesn't
change this. Normally this is resolved by adding the asm @code{volatile}
qualifier. However, by adding the memory constraint, gcc also considers
whether updating the @code{Dest} block will affect the subsequent code.
Only in the event that the @code{Dest} block is not needed will gcc consider
removing this asm code.

@end itemize

@strong{Warning}: Older versions of this documentation show an example for using
memory constraints like this:

@example
"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )
@end example

This code not only doesn't compile in c++, it has potentially undesirable
side effects that affect performance, as well as producing incorrect results.

==========================

dw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]