27234 – no way to stop gcc from mucking with the incoming argument stack on ia32

Bug 27234 - no way to stop gcc from mucking with the incoming argument stack on ia32

Summary: no way to stop gcc from mucking with the incoming argument stack on ia32

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	unknown

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-04-21 02:22 UTC by Albert Cahalan
Modified:	2024-03-28 04:03 UTC (History)
CC List:	5 users (show)

See Also:	https://bugzilla.redhat.com/show_bug.cgi?id=427707
Host:
Target:	i?86-- x86_64--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2006-04-22 20:35:44

Attachments
Possible patch (722 bytes, patch) 2006-05-02 03:40 UTC, Ian Lance Taylor	Details \| Diff
New patch (1.49 KB, patch) 2008-04-29 15:40 UTC, Jan Hubicka	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Albert Cahalan 2006-04-21 02:22:31 UTC

The Linux kernel just gained a few ugly hacks because there isn't a function attribute to tell gcc that it doesn't own the stack:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0a489cb3b6a7b277030cdbc97c2c65905db94536
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=385910f2b275a636238f70844f1b6da9fda6f2da

Comment 1 Andrew Pinski 2006-04-21 02:35:01 UTC

How is asmlinkage defined?


Second this is valid optimization no matter what if you are folllowing the correct ABI which is what GCC is doing.

So the real question is why is the kernel using a different ABI here?

Comment 2 Albert Cahalan 2006-04-21 03:16:29 UTC

#define asmlinkage __attribute__((regparm(0)))
#define prevent_tail_call(ret) __asm__ ("" : "=r" (ret) : "0" (ret))

As I recall, the kernel expects the stack to remain untouched
so that it can restore that portion of the stack into registers
for a return from a system call. I think it's even in the right
format to do an IRET instruction.

The i386 kernel has a 4044-byte stack, and the kernel tries to
be very fast (including cache-friendly), so copying data around
to satisfy the normal ABI is not desirable.

Comment 3 Andrew Pinski 2006-04-21 03:19:46 UTC

Did you know sibcalling reduces the stack usage in fact it is kinda stupid to have a function call overhead to create a stack?

Therefor the asm should instead create the stack.

Comment 4 Albert Cahalan 2006-04-21 03:41:18 UTC

Mostly, sibcalling does not happen. So in the common case, setting up the normal ABI-specified stack would be a waste.

Comment 5 Andrew Pinski 2006-04-22 20:24:51 UTC

Use -fno-sibcalling if you cannot use this optimization because you are violating the ABI  really and since GCC depends on the ABI being followed, there is almost nothing GCC can do.

If the kernel wants to do tricks like this, that is their own business and not really GCC business to know the kernel is not following the ABI that has been there since day one.

Comment 6 Richard Biener 2006-04-22 20:35:44 UTC

So you want __attribute__((incoming_stack_frame_read_only)).

Comment 7 Andreas Schwab 2006-04-22 21:09:52 UTC

It's called __attribute__((syscall_linkage)) on ia64.

Comment 8 Joshua 2006-04-22 23:28:41 UTC

The code that calls all the "asmlinkage" calls does so through a vector table.
Consequently, it does not know how many arguments it calls. However, the
arguments live in particular registers when called, so it just pushes all of them
(it has to preserve all the registers anyway).

Comment 9 Albert Cahalan 2006-04-23 02:05:35 UTC

Actually, there is no desire to prevent sibling calls. That's a hack. Sibling calls are desirable as long as they don't muck with the incoming argument stack.

Using -fno-sibcalling is definitely out of the question. It would affect other functions.

Note: the kernel uses -mregparm=3 as the default.

Once indexed by Google, a query for "slightly different macro that talks less about tailcalls" ought to find some more discussion of this. Google groups has it already.

Comment 10 Albert Cahalan 2006-04-23 02:09:35 UTC

A couple quotes from Linus on the linux-kernel mailing list, in response to the idea (expressed in comment #3 above) of having the assembly set up the normal stack:

-- 1 --

Sure, we could just do a slower system call entry. We always knew that.

Suggesting that as the solution is pretty stupid, though. That's _the_
most timing-critical part in the whole kernel on many loads. We've
literally spent time trying to remove single cycles there, and it matters.

I'd much rather have an officially sanctioned way to do what Linux wants
to do, but in the meantime, we can (and are forced to) rely on hacks like
"prevent_tail_call()". They are hacks, and we _know_ they are hacks, but
as long as gcc maintainers don't want to help us, we don't have much
choice (although we could perhaps make the hacks a bit nicer ;).

The fact is, the "official ABI" simply isn't appropriate. In fact, we
don't use the "official ABI" _anywhere_ in the kernel any more, since we
actually end up using other gcc calling conventions (ie regparm=3).

Btw, this is not even unusual. A lot of embedded platforms have support
for magic exception/interrupt calling conventions. Gcc even supports them:
things like "__attribute__((interrupt))". This is also exactly analogous
to stdcall/cdecl/regparm/longcall/naked/sp_switch/trap_exit etc
attributes.

So gcc already has support for the fact that people sometimes need special
calling conventions. We've historically worked around it by hand instead,
since our calling convention is very _close_ to the standard one

In fact, the calling convention we want there is _so_ close to the
standard one, that I'm not even convinced the i386 ABI really says that
the callee owns the parameter space - it may well be a local gcc decision
rather than any "official i386 ABI" issue.

-- 2 --

I'd much rather have a real gcc attribute. I don't _like_
having horrible hacks like the above to make gcc do what I want it to do.
And I know the gcc developers don't like it either when I force gcc to be
my biatch. It would be much better for everybody if gcc helped a bit.

Comment 11 Andrew Pinski 2006-04-23 02:12:58 UTC

A couple of points.  One GCC follows the ABI and with changes like regparm it still follows a defined ABI.
There are a couple more cases than just sibcalling which will fail even now.  Reload (and future RAs) uses where the incomming arguments are to spill for those cases so in reality it is even harder to fix the problem.  So setting up the stack is only real thing to do.

Comment 12 Ian Lance Taylor 2006-05-01 04:51:04 UTC

Can somebody please add a small standalone test case showing the problem here?

Thanks.

Comment 13 Andrew Pinski 2006-05-01 05:58:14 UTC

(In reply to comment #12)
> Can somebody please add a small standalone test case showing the problem here?

One is:
int g(int a, int b);

int f(int a, int b)
{
  g(a, b);
  return g(a, b);
}

Comment 14 Ian Lance Taylor 2006-05-02 03:40:44 UTC

Created attachment 11354 [details]
Possible patch

I've attached a possible patch for this issue.  It adds a new attribute "preserve_stack" which tells the compiler that the stack should be preserved.  For example, __attribute__ ((preserve_stack, regparm (0))).

It seems to work with pinskia's small test case.  Could somebody see whether it works with the original test case?

Thanks.

Comment 15 Jakub Jelinek 2008-04-10 17:40:21 UTC

The i?86 Linux kernel just got bitten by this again, see
https://bugzilla.redhat.com/show_bug.cgi?id=427707
In this case, it wasn't about a tail call, but I believe high register pressure that caused modification of the incoming stack slot, which is undesirable for
kernel asmlinkage functions.

The #c14 patch after small fixup (s/preserve-stack/preserve_stack/) fixes some cases, but certainly all, on the other side pessimizes code even where it is not strictly necessary (though it is unclear how hard would it be to improve that).

On f1 already vanilla gcc doesn't modify the incoming stack area, and the patch
just pessimizes it by reading all arguments into registers and then storing them 
into a different stack slot, eventhough d argument is actually never modified
and so could be used directly from the stack slot, avoiding one move from memory and one move to memory.  f2 is an example where the patch doesn't help at all,
and incoming stack area is modified even with the patch.  On f3 the patch properly prevents doing tail call, as that would actually modify the incoming stack area.  But on f4 there is no need to avoid the tail call.

I guess we could live with avoiding all tail calls, but a) need a fix for f2
b) for f1 I'd say at expand time for -O1+ we should know whether an argument
is TREE_ADDRESSABLE or not, and whether it is ever modified or not, which could
help us avoid unnecessary copying.  Still will need to make sure reloading won't
push anything into the incoming registers area.

Comment 16 Jakub Jelinek 2008-04-10 18:09:21 UTC

From the kernel people it would be interesting to hear on which targets they
actually need it (e.g. if it is i?86 only, it would be better implemented as
i?86 specific syscall_linkage attribute).

Comment 17 Jakub Jelinek 2008-04-10 18:13:27 UTC

x86_64 actually isn't a problem, as it passes integral arguments in registers
(and kernel only uses at most 6 syscall arguments).  Structures aren't passed
by value.

Comment 18 Jakub Jelinek 2008-04-11 10:25:10 UTC

Seems I forgot to provide the testcase I was talking about.  Here it is:

__attribute__((preserve_stack)) void f1 (int a, int b, int c, int d, int e)
{
  int i;
  for (i = 0; i < 50; i++)
    {
      // Simulate high register pressure, I'm lazy
      asm volatile ("" : : "r" (e) : "%eax", "%ebx", "%ecx", "%edx", "%esi");
      e++;
      asm volatile ("" : : "r" (d) : "%eax", "%ebx", "%ecx", "%edx", "%esi");
    }
}

__attribute__((preserve_stack)) void f2 (int a, int b, int c, int d, int e)
{
  int i;
  for (i = 0; i < 50; i++)
    {
      asm volatile ("" : : "m" (e) : "%eax", "%ebx", "%ecx", "%edx", "%esi", "%edi");
      e++;
    }
}

extern int fn (int, int);

__attribute__((preserve_stack)) int f3 (int a, int b)
{
  return fn (b, a);
}

__attribute__((preserve_stack)) int f4 (int a, int b)
{
  return fn (a, b);
}

Comment 19 Jan Hubicka 2008-04-24 16:05:26 UTC

I am going to make patch that will with preserve-stack attribute in addition to inhibitting libcall also put REG_EQUIV notes on a copy instead of argument register itself that should be sufficient to prevent reload from re-using the slot.

I think we need to add a temporary register with REG_EQUIV note that will be copypropagated if argument is read only (or to read only parts) so rematerialization works.

Honza

Comment 20 Jan Hubicka 2008-04-29 15:40:41 UTC

Created attachment 15546 [details]
New patch

Hi,
this patch implements idea of having temporary read-only register with REG_EQUIV note that will get propagated to place of the read-write parameter register.
However I am still midly confused, when reload is actually using the operand slot in read-write way?  I don't seem to be able to construct a testcase.
If reload is not doing that, perhaps just dropping the function.c part of patch is easiest way around.

Honza