This is the mail archive of the
mailing list for the GCC project.
Re: random commentary on -fsplit-stack (and a bug report)
- From: Ian Lance Taylor <iant at google dot com>
- To: "Jay Freeman \(saurik\)" <saurik at saurik dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 28 Feb 2012 10:14:37 -0800
- Subject: Re: random commentary on -fsplit-stack (and a bug report)
- Authentication-results: mr.google.com; spf=pass (google.com: domain of firstname.lastname@example.org designates 10.68.194.73 as permitted sender) email@example.com; dkim=pass firstname.lastname@example.org
- References: <1606746175.5471330408211672.JavaMail.root@frigate>
"Jay Freeman (saurik)" <email@example.com> writes:
> As demonstrated by these snippets, __morestack_segments is a pointer
> to a stack_segment; it is being stored in the context as a void *, but
> is being removed from the context and being passed directly to
> __morestack_release_segments, which in turn expects a pointer to a
> pointer to a stack_segment, not just a bare pointer to a stack
> segment. Probably quite simple to fix (although might be more complex
> than just "add a &").
Thanks for the bug report and the analysis. I think it does simply
require an '&'. That makes it analogous to the way
__morestack_release_segments is used in generic-morestack-thread.c. So,
fixed with the appended patch. Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu. Committed to mainline.
> 1) The current implementation (maybe this is intended to change?) uses
> mmap() to allocate stack segments, which means that every allocation
> involves a system call, a lock in the kernel on a slow data structure
> (anon_vma), and has some non-zero probability of ending up with a
> separate VMA (which is not only slow, but in my understanding uses up
> a limited resource: you can only have 64k VMAs per process).
> Is it possible to instead expose the functionality for allocating
> stack segments out of libgcc for easy replacement by coroutine
> runtimes? I would really love to be able to use my existing memory
> manager to allocate the stack segments. I realize that this allocation
> routine would need to be able to operate with almost no stack: that
> isn't a problem (I can always pivot to another stack if I need any
Makes sense to me. It could actually be used by Go as well, since libgo
has its own memory allocator. Probably the best approach would be to
add another function, __splitstack_set_allocator or something, which
would set the allocator function. The allocator would have to be
updated atomically, of course. And, as you say, it would have to
operate with limited stack space.
> 2) I had seen a discussion on the mailing list regarding argument
> copying, and I must say I'm somewhat confused as to why it is
> sufficient to memcpy the arguments from the old stack to the new one:
> if I have an argument with a non-POD type that has a non-trivial copy
> constructor, it would seem like I need a copy operation to be able to
> use the object from the new stack (maybe, for example, it has an
> internal pointer).
Non-POD values are always passed by pointer. They are not copied. See
pass_by_reference in function.c. Basically, the frontend is responsible
for calling the copy constructor where required. This is because the
middle-end may want to shuffle the value around in various ways, and we
don't want it to call the copy constructor multiple times while doing
> 3) If I have either blocked signals on my thread or have setup an
> alternate signal stack with sigaltstack, I can get away with
> super-tiny stacks. However, allocate_segment has a minimum stack size
> of MINSIGSTKSZ (I presume to allow for signals), which on some systems
> I use (such as Mac OS X) I've seen be set as high as 32kB. (Meanwhile,
> MINSIGSTKSZ on Linux is smaller than a page, so mmap() can't even
> allocate it.)
Makes sense to me. There should be a way for the program to specify the
minimum stack segment size.
> 4) 10 64-bit words for the splitstack context is a really large amount
> of space. :( I don't even have that much CPU-state (there are only 8
> registers that really need to be saved when switching between
> coroutines). Considering the stack segments form a doubly-linked-list,
> it would seem like MORESTACK_SEGMENTS and CURRENT_SEGMENT are
> redundant. I also feel like CURRENT_STACK could be worked around
> fairly well.
As you know, I wanted to allow for future expansion. I agree that it
would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
off space for time, since it would mean that setcontext would have to
walk up the list. I think CURRENT_STACK is required for
__splitstack_find_context. And __splitstack_find_context is required
for Go's garbage collector. At least, it's not obvious to me how to
avoid requiring CURRENT_STACK for that case.
Not sure what to do here. Obviously you could cheat and use 7 words
instead of 10.
> 5) As currently implemented, the stack space check is added to every
> single function. However, some functions do not actually use the stack
> (or might even be avoiding memory accesses entirely). When I look at
> the disassembly of my project, I see many references to __morestack
> and "cmp %fs:0x70,%rsp" in functions that would otherwise be just a
> few instructions long. Functions that don't use stack should avoid the
I agree. Want to write a patch? Or at least file a bug report.
> 6) I have noticed that the purpose of having split stacks seems
> largely hobbled by the way the linker enforces humungous stacks on
> outgoing calls to non-split-stack code, even if that code isn't
> called. As an incredibly painful example: __splitstack_getcontext is
> not compiled with split-stack support, which means that the function I
> have to switch coroutines (called from every coroutine) allocates
Yes, this is painful.
> To explain what I mean by "even if that code isn't called": my code
> hardly ever throws exception, but because I support them I end up with
> _Unwind_resume in most of my functions; I thereby get burned with
> giant stacks. It would seem more ideal (although I see how this would
> be much more difficult) if there were some way to only allocate the
> larger stack as the call is made to the non-split-stack function, not
> when entering the split-stack one.
It would certainly be possible for the compiler to arrange to allocate a
large stack as it called the non-split-stack function. Unfortunately, I
don't see how the linker could do it. And it's the linker, not the
compiler, that knows that it is a call to a non-split-stack function.
The linker could do it if the compiler could somehow indicate the number
of bytes of arguments that were pushed on the stack, but I don't see a
good way to do that offhand.
> A lot of these problems would be solved if libgcc (and whatever
> friends, such as libsupc++) were themselves compiled with
> -fsplit-stack. Of course, I can't imagine that anyone would want to
> pay the performance penalty for that globally ;P. So, is there some
> plan to either do that for the entire build, or to provide alternative
> versions of those libraries that can be linked to while using
> -fsplit-stack in your own code?
If you are building your own toolchain you could use multilib to get a
-fsplit-stack version of libgcc and libstdc++. Or you could simply add
-fsplit-stack to CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET. But I don't
see a clean way to handle this if you are not building your own
toolchain. As you say, -fsplit-stack is unlikely to be incorporated
into the default library build any time soon.
> That said, I don't think that that entirely does away with this
> "uncalled function drags in stack requirements" problem, as I want to
> say the core issue comes down to how this interacts with the
> inliner. In many of these cases, the call to the non-split-stack
> function is in some leaf function of a giant call graph that was
> flattened to a single massive function during the optimization pass.
> The result is that if you ever interact with non-split-stack code
> anywhere, you really need to be quite explicit about __noinline__ to
> keep it from tainting the stack requirements of other functions. Part
> of me feels like there must be a better way of handling the stack
> expansions (such as by putting it at the call-site in situations like
> this), although I realize that might be difficult with the linker in
> charge of it.
> A specific idea that might help, however, is to set things up so that
> the PLT actually handles the stack increases when you are linking to
> functions that are in a dynamic library. That way, calls to open (for
> example) would not cause the function that called it to suddenly
> require a large stack, but instead only as control is transferred to
> open would the stack size increase. (This might be quite complex,
Yes, again you have to know how many bytes of arguments were pushed on
the stack. You can pretty much know this for open, of course, but it's
a lot more complex for printf (if printf were compiled in split-stack
mode it would straightforward, but of course in this example it is not).
I agree that this could be a lot nicer. It's a bit less important for
Go because obviously the Go compiler is completely in control of all
functions called by Go code.
> 7) Using the linker to handle the transition between split-stack and
> non-split-stack code seems like a good way to solve the problem of "we
> need large stacks when hitting external code", but in staring at the
> resulting code I have in my project I'm seeing that it isn't reliable:
> if you have a pointer to a function the linker will not know what you
> are calling. In my case, this is coming up often due to using
Yes, good point. I think I had some plan for handling that but I no
longer recall what it was.
> More awkwardly, split-stack functions that mention (but do not call)
> non-split-stack functions (such as to return their address) are being
> mis-flagged by the linker. Honestly, I question whether the linker
> fundamentally has enough information about what is going on to be able
> to make sufficiently accurate decisions with regards to stack
> constraints to warrant the painful abstraction breakage that
> split-stack uses. :(
Your're right that the linker doesn't really have enough information.
But is a split-stack function that returns the address of a
non-split-stack function really so frequent that it's worth worrying
> That said, I don't have a better solution to suggest right now (I
> really want to say that having attributes available to declare
> split-stack functionality in the code would be better, but that has
> other ramifications), but I do have concerns that due to attempts to
> keep the ABI fixed decisions made now (when there seem to only be a
> single major user, Go) will lock in how the mechanism is capable of
> functioning in the future.
I may misunderstand your suggestion, but I think that keeping the ABI
fixed is a requirement. Any ABI change would require rebuilding all
libraries and changing the debugger. The result would not be usable for
> If some of these things are in the "yeah, changing that would be
> interesting, we are just don't have many people working on the
> feature", I'd be happy to throw some patches towards it. I hesitate to
> just start sending patches over the wall, however, without first doing
> some kind of verification that I have any clue what I'm doing; I
> certainly am not certain how things would be prioritized, or even
> really who is working on it. ;P
In general, patches would be great (sign the copyright assignment first,
of course). You are correct that there aren't many people working on
the feature--I'm the only one. And I'm not even really keeping up with
Go, much less the use of split stack code in other languages.
Thanks for the thoughtful comments.
2012-02-28 Ian Lance Taylor <firstname.lastname@example.org>
* generic-morestack.c (__splitstack_releasecontext): Correct call
--- libgcc/generic-morestack.c (revision 184633)
+++ libgcc/generic-morestack.c (working copy)
@@ -1104,7 +1104,9 @@ __splitstack_resetcontext (void *context
__splitstack_releasecontext (void *context)
- __morestack_release_segments (context[MORESTACK_SEGMENTS], 1);
+ __morestack_release_segments (((struct stack_segment **)
/* Like __splitstack_block_signals, but operating on CONTEXT, rather