This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: %fs and %gs segments on x86/x86-64


On Thu, Jul 2, 2015 at 5:57 PM, Armin Rigo <arigo@tunes.org> wrote:
> Hi all,
>
> I implemented support for %fs and %gs segment prefixes on the x86 and
> x86-64 platforms, in what turns out to be a small patch.
>
> For those not familiar with it, at least on x86-64, %fs and %gs are
> two special registers that a user program can ask be added to any
> address machine instruction.  This is done with a one-byte instruction
> prefix, "%fs:" or "%gs:".  The actual value stored in these two
> registers cannot quickly be modified (at least before the Haswell
> CPU), but the general idea is that they are rarely modified.
> Speed-wise, though, an instruction like "movq %gs:(%rdx), %rax" runs
> at the same speed as a "movq (%rdx), %rax" would.  (I failed to
> measure any difference, but I guess that the instruction is one more
> byte in length, which means that a large quantity of them would tax
> the instruction caches a bit more.)
>
> For reference, the pthread library on x86-64 uses %fs to point to
> thread-local variables.  There are a number of special modes in gcc to
> already produce instructions like "movq %fs:(16), %rax" to load
> thread-local variables (declared with __thread).  However, this
> support is special-case only.  The %gs register is free to use.  (On
> x86, %gs is used by pthread and %fs is free to use.)
>
>
> So what I did is to add the __seg_fs and __seg_gs address spaces.  It
> is used like this, for example:
>
>     typedef __seg_gs struct myobject_s {
>         int a, b, c;
>     } myobject_t;
>
> You can then use variables of type "struct myobject_s *o1" as regular
> pointers, and "myobject_t *o2" as %gs-based pointers.  Accesses to
> "o2->a" are compiled to instructions that use the %gs prefix; accesses
> to "o1->a" are compiled as usual.  These two pointer types are
> incompatible.  The way you obtain %gs-based pointers, or control the
> value of %gs itself, is out of the scope of gcc; you do that by using
> the correct system calls and by manual arithmetic.  There is no
> automatic conversion; the C code can contain casts between the three
> address spaces (regular, %fs and %gs) which, like regular pointer
> casts, are no-ops.
>
>
> My motivation comes from the PyPy-STM project ("removing the Global
> Interpreter Lock" for this Python interpreter).  In this project, I
> want *almost all* pointer manipulations to resolve to different
> addresses depending on which thread runs the code.  The idea is to use
> mmap() tricks to ensure that the actual memory usage remains
> reasonable, by sharing most of the pages (but not all of them) between
> each thread's "segment".  So most accesses to a %gs-prefixed address
> actually access the same physical memory in all threads; but not all
> of them.  This gives me a dynamic way to have a large quantity of data
> which every thread can read, and by changing occasionally the mapping
> of a single page, I can make some changes be thread-local, i.e.
> invisible to other threads.
>
> Of course, the same effect can be achieved in other ways, like
> declaring a regular "__thread intptr_t base;" and adding the "base"
> explicitly to every pointer access.  Clearly, this would have a large
> performance impact.  The %gs solution comes at almost no cost.  The
> patched gcc is able to compile the hundreds of MBs of (generated) C
> code with systematic %gs usage and seems to work well (with one
> exception, see below).
>
>
> Is there interest in that?  And if so, how to progress?

It's nice to have the ability to test address-space issues on a
commonly available target at least (not sure if adding runtime
testcases is easy though).

> * The patch included here is very minimal.  It is against the
> gcc_5_1_0_release branch but adapting it to "trunk" should be
> straightforward.
>
> * I'm unclear if target_default_pointer_address_modes_p() should
> return "true" or not in this situation: i386-c.c now defines more than
> the default address mode, but the new ones also use pointers of the
> same standard size.
>
> * One case in which this patched gcc miscompiles code is found in the
> attached bug1.c/bug1.s.  (This case almost never occurs in PyPy-STM,
> so I could work around it easily.)  I think that some early, pre-RTL
> optimization is to "blame" here, possibly getting confused because the
> nonstandard address spaces also use the same size for pointers.  Of
> course it is also possible that I messed up somewhere, or that the
> whole idea is doomed because many optimizations make a similar
> assumption.  Hopefully not: it is the only issue I encountered.

Hmm, without being able to dive into it with a debugger it's hard to tell ;)
You might want to open a bugreport in bugzilla for this at least.

> * The extra byte needed for the "%gs:" prefix is not explicitly
> accounted for.  Is it only by chance that I did not observe gcc
> underestimating how large the code it writes is, and then e.g. use
> jump instructions that would be rejected by the assembler?

Yes, I think you are just lucky here.

Richard.

> * For completeness: this is very similar to clang's
> __attribute__((addressspace(256))) but a few details differ.  (Also,
> not to discredit other projects in their concurrent's mailing list,
> but I had to fix three distinct bugs in llvm before I could use it.
> It contributes to me having more trust in gcc...)
>
>
> Links for more info about pypy-stm:
>
> * http://morepypy.blogspot.ch/2015/03/pypy-stm-251-released.html
> * https://bitbucket.org/pypy/stmgc/src/use-gcc/gcc-seg-gs/
> * https://bitbucket.org/pypy/stmgc/src/use-gcc/c8/stmgc.h
>
>
> Thanks for reading so far!
>
> Armin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]