This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
The Linux kernel just gained a few ugly hacks because there isn't a function attribute to tell gcc that it doesn't own the stack: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0a489cb3b6a7b277030cdbc97c2c65905db94536 http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=385910f2b275a636238f70844f1b6da9fda6f2da
How is asmlinkage defined? Second this is valid optimization no matter what if you are folllowing the correct ABI which is what GCC is doing. So the real question is why is the kernel using a different ABI here?
#define asmlinkage __attribute__((regparm(0))) #define prevent_tail_call(ret) __asm__ ("" : "=r" (ret) : "0" (ret)) As I recall, the kernel expects the stack to remain untouched so that it can restore that portion of the stack into registers for a return from a system call. I think it's even in the right format to do an IRET instruction. The i386 kernel has a 4044-byte stack, and the kernel tries to be very fast (including cache-friendly), so copying data around to satisfy the normal ABI is not desirable.
Did you know sibcalling reduces the stack usage in fact it is kinda stupid to have a function call overhead to create a stack? Therefor the asm should instead create the stack.
Mostly, sibcalling does not happen. So in the common case, setting up the normal ABI-specified stack would be a waste.
Use -fno-sibcalling if you cannot use this optimization because you are violating the ABI really and since GCC depends on the ABI being followed, there is almost nothing GCC can do. If the kernel wants to do tricks like this, that is their own business and not really GCC business to know the kernel is not following the ABI that has been there since day one.
So you want __attribute__((incoming_stack_frame_read_only)).
It's called __attribute__((syscall_linkage)) on ia64.
The code that calls all the "asmlinkage" calls does so through a vector table. Consequently, it does not know how many arguments it calls. However, the arguments live in particular registers when called, so it just pushes all of them (it has to preserve all the registers anyway).
Actually, there is no desire to prevent sibling calls. That's a hack. Sibling calls are desirable as long as they don't muck with the incoming argument stack. Using -fno-sibcalling is definitely out of the question. It would affect other functions. Note: the kernel uses -mregparm=3 as the default. Once indexed by Google, a query for "slightly different macro that talks less about tailcalls" ought to find some more discussion of this. Google groups has it already.
A couple quotes from Linus on the linux-kernel mailing list, in response to the idea (expressed in comment #3 above) of having the assembly set up the normal stack: -- 1 -- Sure, we could just do a slower system call entry. We always knew that. Suggesting that as the solution is pretty stupid, though. That's _the_ most timing-critical part in the whole kernel on many loads. We've literally spent time trying to remove single cycles there, and it matters. I'd much rather have an officially sanctioned way to do what Linux wants to do, but in the meantime, we can (and are forced to) rely on hacks like "prevent_tail_call()". They are hacks, and we _know_ they are hacks, but as long as gcc maintainers don't want to help us, we don't have much choice (although we could perhaps make the hacks a bit nicer ;). The fact is, the "official ABI" simply isn't appropriate. In fact, we don't use the "official ABI" _anywhere_ in the kernel any more, since we actually end up using other gcc calling conventions (ie regparm=3). Btw, this is not even unusual. A lot of embedded platforms have support for magic exception/interrupt calling conventions. Gcc even supports them: things like "__attribute__((interrupt))". This is also exactly analogous to stdcall/cdecl/regparm/longcall/naked/sp_switch/trap_exit etc attributes. So gcc already has support for the fact that people sometimes need special calling conventions. We've historically worked around it by hand instead, since our calling convention is very _close_ to the standard one In fact, the calling convention we want there is _so_ close to the standard one, that I'm not even convinced the i386 ABI really says that the callee owns the parameter space - it may well be a local gcc decision rather than any "official i386 ABI" issue. -- 2 -- I'd much rather have a real gcc attribute. I don't _like_ having horrible hacks like the above to make gcc do what I want it to do. And I know the gcc developers don't like it either when I force gcc to be my biatch. It would be much better for everybody if gcc helped a bit.
A couple of points. One GCC follows the ABI and with changes like regparm it still follows a defined ABI. There are a couple more cases than just sibcalling which will fail even now. Reload (and future RAs) uses where the incomming arguments are to spill for those cases so in reality it is even harder to fix the problem. So setting up the stack is only real thing to do.
Can somebody please add a small standalone test case showing the problem here? Thanks.
(In reply to comment #12) > Can somebody please add a small standalone test case showing the problem here? One is: int g(int a, int b); int f(int a, int b) { g(a, b); return g(a, b); }
Created an attachment (id=11354) [edit] Possible patch I've attached a possible patch for this issue. It adds a new attribute "preserve_stack" which tells the compiler that the stack should be preserved. For example, __attribute__ ((preserve_stack, regparm (0))). It seems to work with pinskia's small test case. Could somebody see whether it works with the original test case? Thanks.
The i?86 Linux kernel just got bitten by this again, see https://bugzilla.redhat.com/show_bug.cgi?id=427707 In this case, it wasn't about a tail call, but I believe high register pressure that caused modification of the incoming stack slot, which is undesirable for kernel asmlinkage functions. The #c14 patch after small fixup (s/preserve-stack/preserve_stack/) fixes some cases, but certainly all, on the other side pessimizes code even where it is not strictly necessary (though it is unclear how hard would it be to improve that). On f1 already vanilla gcc doesn't modify the incoming stack area, and the patch just pessimizes it by reading all arguments into registers and then storing them into a different stack slot, eventhough d argument is actually never modified and so could be used directly from the stack slot, avoiding one move from memory and one move to memory. f2 is an example where the patch doesn't help at all, and incoming stack area is modified even with the patch. On f3 the patch properly prevents doing tail call, as that would actually modify the incoming stack area. But on f4 there is no need to avoid the tail call. I guess we could live with avoiding all tail calls, but a) need a fix for f2 b) for f1 I'd say at expand time for -O1+ we should know whether an argument is TREE_ADDRESSABLE or not, and whether it is ever modified or not, which could help us avoid unnecessary copying. Still will need to make sure reloading won't push anything into the incoming registers area.
From the kernel people it would be interesting to hear on which targets they actually need it (e.g. if it is i?86 only, it would be better implemented as i?86 specific syscall_linkage attribute).
x86_64 actually isn't a problem, as it passes integral arguments in registers (and kernel only uses at most 6 syscall arguments). Structures aren't passed by value.
Seems I forgot to provide the testcase I was talking about. Here it is: __attribute__((preserve_stack)) void f1 (int a, int b, int c, int d, int e) { int i; for (i = 0; i < 50; i++) { // Simulate high register pressure, I'm lazy asm volatile ("" : : "r" (e) : "%eax", "%ebx", "%ecx", "%edx", "%esi"); e++; asm volatile ("" : : "r" (d) : "%eax", "%ebx", "%ecx", "%edx", "%esi"); } } __attribute__((preserve_stack)) void f2 (int a, int b, int c, int d, int e) { int i; for (i = 0; i < 50; i++) { asm volatile ("" : : "m" (e) : "%eax", "%ebx", "%ecx", "%edx", "%esi", "%edi"); e++; } } extern int fn (int, int); __attribute__((preserve_stack)) int f3 (int a, int b) { return fn (b, a); } __attribute__((preserve_stack)) int f4 (int a, int b) { return fn (a, b); }
I am going to make patch that will with preserve-stack attribute in addition to inhibitting libcall also put REG_EQUIV notes on a copy instead of argument register itself that should be sufficient to prevent reload from re-using the slot. I think we need to add a temporary register with REG_EQUIV note that will be copypropagated if argument is read only (or to read only parts) so rematerialization works. Honza
Created an attachment (id=15546) [edit] New patch Hi, this patch implements idea of having temporary read-only register with REG_EQUIV note that will get propagated to place of the read-write parameter register. However I am still midly confused, when reload is actually using the operand slot in read-write way? I don't seem to be able to construct a testcase. If reload is not doing that, perhaps just dropping the function.c part of patch is easiest way around. Honza