This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: random commentary on -fsplit-stack (and a bug report)

> > "Jay Freeman (saurik)" <>
> "Ian Lance Taylor" <>

> Thanks for the bug report and the analysis.  I think it does simply
> require an '&'.  That makes it analogous to the way
> __morestack_release_segments is used in generic-morestack-thread.c. 

The only reason I hesitated on that is that it might not make sense to update the pointer in the context. In my specific case, that will actually cause it to crash ;P, as while the current stack I'm calling __splitstack_releasecontext from is valid, the context pointer I'm passing is actually stored on the old stack, and will be unallocated by __morestack_releasse_segments.

I can always just change my code to copy the context to the other stack before calling __splitstack_releasecontext, however, so that isn't a problem for me. Though, I also wasn't certain what the releasecontext function actually wanted to do with that pointer, as I hadn't yet read much of the morestack code; I now see that it is just the head of a linked list, so yeah: passing the address out of the context seems fine.

> As you know, I wanted to allow for future expansion.  I agree that it
> would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
> off space for time, since it would mean that setcontext would have to
> walk up the list.  I think CURRENT_STACK is required for
> __splitstack_find_context.  And __splitstack_find_context is required
> for Go's garbage collector.  At least, it's not obvious to me how to
> avoid requiring CURRENT_STACK for that case.

The basis of that suggestion was not just that the items in the context could be removed, but that the underlying state used by split stacks might not need the values at all. In this case, I am not certain why __morestack_segments is needed: it seems to only come in to play when __morestack_current_segment is NULL (and I'm not certain how that would happen) and while deallocating dynamic blocks (which is already linear).

I might provide a patch to better describe what I mean by this. I've started the process of getting a copyright assignment in place (sent an e-mail to per

> I agree.  Want to write a patch?  Or at least file a bug report.


> [paragraph moved below]

> > 7) Using the linker to handle the transition between split-stack and
> > non-split-stack code seems like a good way to solve the problem of "we
> > need large stacks when hitting external code", but in staring at the
> > resulting code I have in my project I'm seeing that it isn't reliable:
> > if you have a pointer to a function the linker will not know what you
> > are calling. In my case, this is coming up often due to using
> > std::function.
> Yes, good point.  I think I had some plan for handling that but I no
> longer recall what it was.

After getting more sleep, I realize that this problem is actually much more endemic than I had even previously thought. Most any vaguely object-oriented library is going to have tons of function pointers in it, and you often interact solely with those function pointers (as in, you have no actual symbol references anywhere). A simple example: in the case of C++, any call to a non-split-stack virtual function will fail.

"""Function pointers are a tricky case. In general we don't know whether a function pointer points to split-stack code. Therefore, all calls through a function pointer will be modified to call (or jump to) a special function __fnptr_morestack. This will use a target specific function calling sequence, and will be implemented as though it were itself a function call instruction. That is, all the parameters will be set up, and then the code will jump to __fnptr_morestack. The __fnptr_morestack function takes two parameters: the function pointer to call, and the number of bytes of arguments pushed on the stack. (This is not yet implemented.)"""

That paragraph is from your design document (SplitStacks on the GCC wiki). I presume that this solution would only work if __fnptr_morestack always assumed that the target did not support split-stack? Alternatively, I can see having that stub look at the function to see if its first instruction was a comparison to the TCB stack limit entry (using similar logic to that used by the linker)? [also, see below in this e-mail]

> > More awkwardly, split-stack functions that mention (but do not call)
> > non-split-stack functions (such as to return their address) are being
> > mis-flagged by the linker. Honestly, I question whether the linker
> > fundamentally has enough information about what is going on to be able
> > to make sufficiently accurate decisions with regards to stack
> > constraints to warrant the painful abstraction breakage that
> > split-stack uses. :(
> Your're right that the linker doesn't really have enough information.
> But is a split-stack function that returns the address of a
> non-split-stack function really so frequent that it's worth worrying
> about?

I guess the question I have is: is one of the goals to make this option "safe to turn on for a random project"? Given the abstraction break that was made between the compiler and the linker, it would seem like this was a rather critically important goal (as now both the linker and the compiler are less modular and more difficult to modify), but in fact the result doesn't manage to solve seemingly simple corner cases.

The reason I'm running into these issues is not due to virtual dispatch (at least yet: this codebase was C 5 years ago, but is now being ported to C++), but instead due to higher-order functions. I'm finding myself in situations where std::function and std::bind are disconnecting the symbol references from the call sites sufficiently (even moving them to different stacks ;P) to cause the linker to make seemingly random decisions.

That said, I can demonstrate a really common idiom, from C (not C++), that is almost always going to involve non-split-stack code (as malloc and free are normally going to be in libc, compiled without -fsplit-stack), and that is morally equivalent to "returning a function pointer and using it later": data structures that keep information on a block of dynamically allocated memory and "how to free it". Here's a lame version:

struct String {
    const char *data;
    void (*free)(void *);

void ClearString(String *string) {
    if (string->data != NULL && string->free != NULL)

    string->data = NULL;
    string->free = NULL;

void SetString(String *string, const char *data, bool alloc) {

    string->data = data;
    string->free = alloc ? &free : NULL;

void f(String *string) {
    SetString(string, "hello", true);
    ClearString(string); /* potential stack overflow */

(Incidentally, if you use std::vector with a custom allocator that has any kind of indirection in it, this is going to come up quite a lot. The code for vector instantiated over your allocator will be compiled as part of your code with -fsplit-stack, but if the memory allocator being used is something compiled without then you are going to end up with a really complex version of the above code and a stack overflow.)

[paragraph from above]
> It would certainly be possible for the compiler to arrange to allocate a
> large stack as it called the non-split-stack function.  Unfortunately, I
> don't see how the linker could do it.  And it's the linker, not the
> compiler, that knows that it is a call to a non-split-stack function.

However, the linker doesn't actually have any notion of "calls", which is what causes the previous problems. In a language like C++ (or even C) it isn't really true that a function that calls another function will go through a symbol reference to do it. Anyone who uses code that involves dlsym, higher-order functions, or polymorphic object-oriented libraries will run into cases that the current -fsplit-stack implementation doesn't even provide good (certainly not documented) workarounds for.

Part of me (and I realize that this causes other tradeoffs, and I'm therefore not even recommending it: more just musing) feels like the notion of "supports split stack" is more of a calling convention. In the same way that gcc currently supports regparm, stdcall, thiscall, fastcall... it seems like it might simply be a new attribute (probably orthogonal to the calling convention) a function can have (and would not have by default): splitcall.

In such an implementation, like many of the existing calling-convention related attributes, splitcall would be considered part of the type signature (and thereby would not be allowed to be put on a definition and not on the related prototype), and could be opted in for a large block of code using a #pragma or a compiler-switch. (Again: this is just musing. I haven't put much thought into whether this would actually be semantically reasonable yet.)

[see above in this e-mail] For cases where the compiler "simply doesn't know", the solution that was brought up for function pointers could be used: have a level of indirection in the calls that includes the number of arguments. That code could then read the target of the call to see whether the function at the other side looked like it supported split-stack, and if not it could allocate more stack at the time of that call.

The developer would now be put in the position of thinking about what they are calling sometimes (and making certain that their usage of the pragma and header files lined up), but honestly I already am having to think about that (due to the linker having both false positives and false negatives for all of the above reasons, whether the inliner conflating calls or function pointers obfuscating them), and I have no explicit mechanism to override it.

In fact, I almost want to say that the worst-case scenario in the "rely on the compiler" is the developer throwing up their hands in defeat and attempting to recompile "the world" (including libgcc, libsupc++, etc.) with -fsplit-stack... but that's where I already am at with the current linker-based implementation: the main/only way I'm going to be able to avoid having function pointers to non-split-stack code is to recompile every library I need with -fsplit-stack.

> > A specific idea that might help, however, is to set things up so that
> > the PLT actually handles the stack increases when you are linking to
> > functions that are in a dynamic library. That way, calls to open (for
> > example) would not cause the function that called it to suddenly
> > require a large stack, but instead only as control is transferred to
> > open would the stack size increase. (This might be quite complex,
> > though.)
> Yes, again you have to know how many bytes of arguments were pushed on
> the stack.  You can pretty much know this for open, of course, but it's
> a lot more complex for printf (if printf were compiled in split-stack
> mode it would straightforward, but of course in this example it is
> not).
> I agree that this could be a lot nicer.  It's a bit less important for
> Go because obviously the Go compiler is completely in control of all
> functions called by Go code.

In this model (still using the linker, but pushing the stack-split into or around the @plt stub function), I would have to propose that variadic functions are treated specially (possibly using a similar/identical setup to the one you were proposing for function pointers) where the argument count was also passed. This could be pushed onto the stack right before the call and popped/thrown off the stack first thing in the stub when not needed (which has the benefit of being portable between targets and not messing with the existing argument placement).

> > That said, I don't have a better solution to suggest right now (I
> > really want to say that having attributes available to declare
> > split-stack functionality in the code would be better, but that has
> > other ramifications), but I do have concerns that due to attempts to
> > keep the ABI fixed decisions made now (when there seem to only be a
> > single major user, Go) will lock in how the mechanism is capable of
> > functioning in the future.
> I may misunderstand your suggestion, but I think that keeping the ABI
> fixed is a requirement.  Any ABI change would require rebuilding all
> libraries and changing the debugger.  The result would not be usable
> for most people.

What I meant by these "concerns" is that it seems like the current mechanism for -fsplit-stack is going to get locked into place (and be unable to change due to ABI breakage) in a state where even with the abstraction leak between the compiler and the linker (which I feel is quite costly) it doesn't really solve the problem for many users (and possibly even, any other than Go, which might have enough constraints to make this work).

However, this might not actually be that big of a concern, thinking about it more. As the existing implementation is, as we both believed, unlikely to be incorporated into the default library build, it is really then just a matter that -fsplit-stack has to exist with this implementation and Gold needs to continue to supporting it; gcc already has tons of ld-specific flags: this could just be another one. A later/different implementation of split-stack could be -fsplit-stack-ex or something, and existing independently and in parallel.

[vaguely in reply to everything above]

Actually, thinking about it more: it seems like 99% of these problems could be solved by providing a second symbol definition for the split-stack prologue and binding that as part of the type signature. So, you could either call the "original implementation" of a function using its normal symbol, or you could call the split-stack prologue version of the same function using one that had been mangled with some prefix.

extern "C" int test() {
    return 0xdeadbeef;

0000000000404920 <test>:
  404920:       64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  404927:       00 00 
  404929:       72 06                   jb     404931 <test+0x11>
  40492b:       b8 ef be ad de          mov    $0xdeadbeef,%eax
  404930:       c3                      retq   
  404931:       45 31 d2                xor    %r10d,%r10d
  404934:       45 31 db                xor    %r11d,%r11d
  404937:       e8 6d 6b 00 00          callq  40b4a9 <__morestack>
  40493c:       c3                      retq   
  40493d:       eb ec                   jmp    40492b <test+0xb>
  40493f:       90                      nop

In this case (and yes: this is an example of a function that shouldn't need this prologue at all, but it was short ;P), the existing implementation of -fsplit-stack has modified the function to fundamentally check its stack. No matter how you attempt to call it, we now have to know whether the function supports the split-stack protocol using an out-of-line mechanism, and we cannot enforce our beliefs in the compiler: the linker is complete control of this decision. However, we could instead have it do this:

0000000000404920 <.split.test>:
  404920:       64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  404927:       00 00 
  404929:       72 06                   jb     404931 <test+0x6>
000000000040492b <test>:
  40492b:       b8 ef be ad de          mov    $0xdeadbeef,%eax
  404930:       c3                      retq   
  404931:       45 31 d2                xor    %r10d,%r10d
  404934:       45 31 db                xor    %r11d,%r11d
  404937:       e8 6d 6b 00 00          callq  40b4a9 <__morestack>
  40493c:       c3                      retq   
  40493d:       eb ec                   jmp    40492b <test>
  40493f:       90                      nop

Now the decision to call either test or .split.test becomes explicit. This would allow us to get a linker error if we made an incorrect decision in my earlier not-really-a-suggestion-more-of-a-musing of making this knowledge explicit in the compiler akin to a calling convention. If the compiler decided that something wasn't split-stack, then it would just handle allocating the larger stack before the call to the underlying function; or, if it decided the function was split-stack, the linker would enforce it, and the user would get a reasonable error.

This also has the amazing benefit that it no longer would extract a runtime performance cost for libraries such as libc, libsupc++, and libgcc to be compiled with -fsplit-stack. The existing people who were using the library would continue to call the original version of the code, and only people who were trying to opt-in to the split-stack universe would be using the new split-stack variations. You could imagine an entire distribution of Linux (or whatever) that was compiled with this feature active, and the result would only be a slight increase in code-size.

So: thoughts? I really do think that there is a way to do this -fsplit-stack feature that allows more people to use it and for it to "change everything" / "take over the world" ;P. In my eyes, doing that would require a) that there is no impediment to just compiling everything with -fsplit-stack and b) the functionality to work with a stock linker (as many platforms are not supported by Gold). I think that with some variation on some of my above implementation ideas, this feature could be done generically in the compiler (and libgcc) for all platforms.

(Obviously, though: this is something I've only been looking at for days, and only seriously thinking about the implementation concerns of for hours, so I could easily be overlooking something obvious that you thought about two years ago that makes any or all of these ideas untenable. I certainly will not be bothered to learn that this is all stupid, and in fact will highly appreciate the feedback. Again: thank you so much for even reading any of these thoughts in the first place. ;P)

Jay Freeman (saurik)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]