This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work


Sorry for the (very) delayed response. I'm still looking for feedback here so I can fix the docs.

To refresh: The topic of conversation was the (extremely) wrong explanation that has been in the docs since forever about how to use memory constraints with inline asm to avoid the performance hit of a full memory clobber. Trying to understand how this really works has led to some surprising results.

Me:
>> While I really like the idea of using memory constraints to avoid all out >> memory clobbers, 16 bytes is a pretty small maximum memory block, and x86 >> only supports a max of 8. Unless there's some way to use larger sizes (say
>> SSIZE_MAX), this feature hardly seems worth documenting.

Richard:
> I wonder how you figured out that a 12 byte clobber performs a full
> memory clobber?

Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2, using -m64 -O2 -fdump-final-insns):

--------------------
#include <stdio.h>

#define MYSIZE 3

inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
   struct _reallybigstruct { char x[MYSIZE]; }
      *p = (struct _reallybigstruct *)Dest;

   __asm__ __volatile__ ("rep stos{b|b}"
      : "+D" (Dest), "+c" (Count), "=m" (*p)
      : [Data] "a" (Data)
      //: "memory"
   );
}

int main()
{
   unsigned char buff[100];
   buff[5] = 'A';

   __stosb(buff, 'B', sizeof(buff));
   printf("%c\n", buff[5]);
}
--------------------

In summary:

   1) Create a 100 byte buffer, and set buff[5] to 'A'.
2) Call __stosb, which uses inline asm to overwrite all of buff with 'B'. 3) Use a memory constraint in __stosb to flush buff. The size of the memory constraint is controlled by a #define.

With this, I have a simple way to test various sizes of memory constraints to see if the buffer gets flushed. If it *is* flushing the buffer, printing buff[5] after __stosb will print 'B'. If it is *not* flushing, it will print 'A'.

Results:
- Since buff[5] is the 6th byte in the buffer, using memory constraint sizes of 1, 2 & 4 (not surprisingly) all print 'A', showing that no flush was done. - Sizes of 8 and 16 print 'B', showing that the flush was done. This is also the expected result, since I am now flushing enough of buff to include buff[5]. - The surprise comes from using a size of 3 or 5. These also print 'B'. WTF? If 4 doesn't flush, why does 3?

I believe the answer comes from reading the RTL. The difference between sizes of 3 and 16 comes here:

  (set (mem/c:TI (plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S16 A128])
     (asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [

   (set (mem/c:BLK (plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S3 A128])
         (asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [

While I don't actually speak RTL, TI clearly refers to TIMode. Apparently when using a size that exactly matches a machine mode, asm memory references (on i386) can flush the exact number of bytes. But for other sizes, gcc seems to falls back to BLK mode, which doesn't.

I don't know the exact meaning of BLK on a "set" or "asm_operands." Does it cause a full clobber? Or just a complete clobber of buff? Attempting to answer that question leads us to the second bit of code:

--------------------
#include <stdio.h>

#define MYSIZE 8

inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
   struct _reallybigstruct { char x[MYSIZE]; }
      *p = (struct _reallybigstruct *)Dest;

   __asm__ __volatile__ ("rep stos{b|b}"
      : "+D" (Dest), "+c" (Count), "=m" (*p)
      : [Data] "a" (Data)
      //: "memory"
   );
}
int main()
{
   unsigned char buff[100], buff2[100];
   buff[5] = 'A';
   buff2[5] = 'M';
   asm("#" : : "r" (buff2));

   __stosb(buff, 'B', sizeof(buff));
   printf("%c %c\n", buff[5], buff2[5]);
}
--------------------

Here I've added a buff2, and I set buff2[5] to 'M' (aka ascii 77), which I also print. I still perform the memory constraint against buff, then I check to see if it is affecting buff2.

I start by compiling this with a size of 8 and look at the -S output. If this is NOT doing a full clobber, gcc should be able to just print buff2[5] by moving 77 into the appropriate register before calling printf. And indeed, that's what we see.

/APP
 # 17 "mem2.cpp" 1
        rep stosb
 # 0 "" 2
/NO_APP
        movzbl  37(%rsp), %edx
        movl    $77, %r8d
        leaq    .LC0(%rip), %rcx
        call    printf

If using a size of 3 *is* causing a full memory clobber, we would expect to see the value getting read from memory before calling printf. And indeed, that's also what we see.

/APP
 # 17 "mem2.cpp" 1
        rep stosb
 # 0 "" 2
/NO_APP
        movzbl  37(%rsp), %edx
        leaq    .LC0(%rip), %rcx
        movzbl  149(%rsp), %r8d

I don't know the internals of gcc well enough to understand exactly why this is happening. But from a user's point of view, it sure looks like a memory clobber.

As I said before, triggering a full memory clobber for anything over 16 bytes (and most sizes under 16 bytes) makes this feature all but useless. So if that's really what's happening, we need to decide what to do next:

1) Can this be "fixed?"
2) Do we want to doc the current behavior?
3) Or do we just remove this section?

I think it could be a nice performance win for inline asm if it could be made to work right, but I have no idea what might be involved in that. Failing that, I guess if it doesn't work and isn't going to work, I'd recommend removing the text for this feature.

Since all 3 suggestions require a doc change, I'll just say that I'm prepared to start work on the doc patch as soon as someone lets me know what the plan is.

Richard?  Hans-Peter?  Your thoughts?

Thanks,
dw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]