config/rs6000/linux-unwind.h contains a function ppc_linux_aux_vector which searches for the aux vector based on __libc_stack_end; the only use of this function is to lookup the AT_HWCAP bitfield provided by the kernel. The results seem to be used only for premature optimization: optimizing out copying register sets that aren't used on the current cpu model.
Unfortunately, the method used to search for auxv is invalid at the time the call to ppc_linux_aux_vector is made. The array pointed to by extern char **environ; may be modified by the application; for example it may be truncated to clear the environment by writing *environ = 0; In this case, the old, no-longer-used part of the environ array will be incorrectly interpreted as the aux vector, causing the unwind code not to find the hwcap values and thus not to save the register sets it needs to save.
I found this bug while debugging a report that gcc couldn't be built for powerpc on musl libc due to missing __libc_stack_end symbol. I don't see anywhere that it's documented that __libc_stack_end points to the original argc slot passed from the kernel, rather than some arbitrary address between main's stack frame and argv, so I think it's very bad design to be relying on this implementation-detail anyway. The fix I would like to see is the complete removal of ppc_linux_aux_vector and updating ppc_fallback_frame_state not to care which register sets are actually in use.
There seems like there are two different issues here. The first issue if musl libc not following the same ABI as glibc. The second issues looks like maybe a real one.
The ABI issue is a dependency on an undocumented part of glibc's ABI behavior -- I don't see anywhere it's documented that __libc_stack_end points at "argc" (i.e. the original place the stack pointer points to on program entry), just some point past the end of the stack, and thus it's conceivable that even in glibc it could change to point somewhere else. In any case, it's a gratuitous dependency on glibc internals.
The logic error issue (assuming the initial environ array is still intact) is of course an outright observable bug even without any incompatible changes at the libc level.
Do you have any idea why this code was added to begin with? It seems completely unnecessary.
CCing rs6000 maintainers
__libc_stack_end is not part of the ABI. The problem is Glibc makes it difficult to access auxv, otherwise one could obtain the pointer passed to _start by the kernel. One could read /proc/self/auxv .
Created attachment 29420 [details]
At the time the original code was being developed, linux-2.4.x was in widespread use. /proc/self/auxv was introduced with linux-2.6.0 in Dec 2003. I guess it's reasonable to rely on that nowadays. BTW, it's not entirely an optimization to condition reading of altivec regs on AT_HWCAP. I believe it may be possible to segv if the unwinder tries to dereference a location past the end of struct sigcontext.
That sounds highly doubtful. The sigcontext is (necessarily) on the stack, so the only way accessing past the end of sigcontext could fault is if the access were so far beyond the end to go completely off the stack. The only way this might be plausible is under sigaltstack.
In any case, why would this code be reading beyond the end? Does the kernel use different incompatible sigcontext structures based on which vector registers exist on the cpu?
On thinking about this a little more, the idea of using /proc/self/auxv isn't that good. MD_FALLBACK_FRAME_STATE_FOR is only needed for older kernels; Kernels 2.6.15 and later provide a vdso with unwind info for signal frames.
So I don't think it makes sense to restrict libgcc's support for old kernels to 2.6.0 thru 2.6.14. If we're going to support old kernels, then we ought to continue supporting them all as best we can. And, yes, really old kernels used a different sigcontext for the simple reason that they predated altivec.
I'm inclined to close this bug as WONTFIX, or possibly make __libc_stack_end weak so that libgcc builds with musl libc.
Is there nothing internal in the sigcontext structure that distinguishes the version?
Making the reference to __libc_stack_end weak won't help. If the symbol is undefined, the code in libgcc would crash or malfunction; if it's defined but does not point exactly to the argc/argv start (which, since it's not defined in the ABI, seems to be something that could happen in the future even with glibc), the code will also badly malfunction.
If you want to keep using __libc_stack_end, I think it should be conditional at runtime on old/broken kernel and libc versions, and auxv should be ignored otherwise.
Date: Fri Feb 15 13:53:40 2013
New Revision: 196077
* config/rs6000/linux-unwind.h (ppc_linux_aux_vector): Delete.
(ppc_fallback_frame_state): Always set up save locations for fp
and altivec. Don't bother with non-callee-saved regs, r0-r13
except for r2 on ppc64, fr0-fr13, v0-v19, vscr.