This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: %fs and %gs segments on x86/x86-64
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Armin Rigo <arigo at tunes dot org>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Fri, 3 Jul 2015 10:29:14 +0200
- Subject: Re: %fs and %gs segments on x86/x86-64
- Authentication-results: sourceware.org; auth=none
- References: <CAMSv6X19bZpRXf1XexrW0uFONptwHOrcT9T6RegvNDR1_Z5m3A at mail dot gmail dot com>
On Thu, Jul 2, 2015 at 5:57 PM, Armin Rigo <arigo@tunes.org> wrote:
> Hi all,
>
> I implemented support for %fs and %gs segment prefixes on the x86 and
> x86-64 platforms, in what turns out to be a small patch.
>
> For those not familiar with it, at least on x86-64, %fs and %gs are
> two special registers that a user program can ask be added to any
> address machine instruction. This is done with a one-byte instruction
> prefix, "%fs:" or "%gs:". The actual value stored in these two
> registers cannot quickly be modified (at least before the Haswell
> CPU), but the general idea is that they are rarely modified.
> Speed-wise, though, an instruction like "movq %gs:(%rdx), %rax" runs
> at the same speed as a "movq (%rdx), %rax" would. (I failed to
> measure any difference, but I guess that the instruction is one more
> byte in length, which means that a large quantity of them would tax
> the instruction caches a bit more.)
>
> For reference, the pthread library on x86-64 uses %fs to point to
> thread-local variables. There are a number of special modes in gcc to
> already produce instructions like "movq %fs:(16), %rax" to load
> thread-local variables (declared with __thread). However, this
> support is special-case only. The %gs register is free to use. (On
> x86, %gs is used by pthread and %fs is free to use.)
>
>
> So what I did is to add the __seg_fs and __seg_gs address spaces. It
> is used like this, for example:
>
> typedef __seg_gs struct myobject_s {
> int a, b, c;
> } myobject_t;
>
> You can then use variables of type "struct myobject_s *o1" as regular
> pointers, and "myobject_t *o2" as %gs-based pointers. Accesses to
> "o2->a" are compiled to instructions that use the %gs prefix; accesses
> to "o1->a" are compiled as usual. These two pointer types are
> incompatible. The way you obtain %gs-based pointers, or control the
> value of %gs itself, is out of the scope of gcc; you do that by using
> the correct system calls and by manual arithmetic. There is no
> automatic conversion; the C code can contain casts between the three
> address spaces (regular, %fs and %gs) which, like regular pointer
> casts, are no-ops.
>
>
> My motivation comes from the PyPy-STM project ("removing the Global
> Interpreter Lock" for this Python interpreter). In this project, I
> want *almost all* pointer manipulations to resolve to different
> addresses depending on which thread runs the code. The idea is to use
> mmap() tricks to ensure that the actual memory usage remains
> reasonable, by sharing most of the pages (but not all of them) between
> each thread's "segment". So most accesses to a %gs-prefixed address
> actually access the same physical memory in all threads; but not all
> of them. This gives me a dynamic way to have a large quantity of data
> which every thread can read, and by changing occasionally the mapping
> of a single page, I can make some changes be thread-local, i.e.
> invisible to other threads.
>
> Of course, the same effect can be achieved in other ways, like
> declaring a regular "__thread intptr_t base;" and adding the "base"
> explicitly to every pointer access. Clearly, this would have a large
> performance impact. The %gs solution comes at almost no cost. The
> patched gcc is able to compile the hundreds of MBs of (generated) C
> code with systematic %gs usage and seems to work well (with one
> exception, see below).
>
>
> Is there interest in that? And if so, how to progress?
It's nice to have the ability to test address-space issues on a
commonly available target at least (not sure if adding runtime
testcases is easy though).
> * The patch included here is very minimal. It is against the
> gcc_5_1_0_release branch but adapting it to "trunk" should be
> straightforward.
>
> * I'm unclear if target_default_pointer_address_modes_p() should
> return "true" or not in this situation: i386-c.c now defines more than
> the default address mode, but the new ones also use pointers of the
> same standard size.
>
> * One case in which this patched gcc miscompiles code is found in the
> attached bug1.c/bug1.s. (This case almost never occurs in PyPy-STM,
> so I could work around it easily.) I think that some early, pre-RTL
> optimization is to "blame" here, possibly getting confused because the
> nonstandard address spaces also use the same size for pointers. Of
> course it is also possible that I messed up somewhere, or that the
> whole idea is doomed because many optimizations make a similar
> assumption. Hopefully not: it is the only issue I encountered.
Hmm, without being able to dive into it with a debugger it's hard to tell ;)
You might want to open a bugreport in bugzilla for this at least.
> * The extra byte needed for the "%gs:" prefix is not explicitly
> accounted for. Is it only by chance that I did not observe gcc
> underestimating how large the code it writes is, and then e.g. use
> jump instructions that would be rejected by the assembler?
Yes, I think you are just lucky here.
Richard.
> * For completeness: this is very similar to clang's
> __attribute__((addressspace(256))) but a few details differ. (Also,
> not to discredit other projects in their concurrent's mailing list,
> but I had to fix three distinct bugs in llvm before I could use it.
> It contributes to me having more trust in gcc...)
>
>
> Links for more info about pypy-stm:
>
> * http://morepypy.blogspot.ch/2015/03/pypy-stm-251-released.html
> * https://bitbucket.org/pypy/stmgc/src/use-gcc/gcc-seg-gs/
> * https://bitbucket.org/pypy/stmgc/src/use-gcc/c8/stmgc.h
>
>
> Thanks for reading so far!
>
> Armin