Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 27234
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Albert Cahalan <acahalan@gmail.com>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
foo4.patch Possible patch patch 2006-05-02 03:40 722 bytes Edit | Diff
ddk2 New patch patch 2008-04-29 15:40 1.49 KB Edit | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 27234 depends on: Show dependency tree
Show dependency graph
Bug 27234 blocks:

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2006-04-22 20:35 Opened: 2006-04-21 02:22
The Linux kernel just gained a few ugly hacks because there isn't a function
attribute to tell gcc that it doesn't own the stack:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0a489cb3b6a7b277030cdbc97c2c65905db94536
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=385910f2b275a636238f70844f1b6da9fda6f2da

------- Comment #1 From Andrew Pinski 2006-04-21 02:35 -------
How is asmlinkage defined?


Second this is valid optimization no matter what if you are folllowing the
correct ABI which is what GCC is doing.

So the real question is why is the kernel using a different ABI here?

------- Comment #2 From Albert Cahalan 2006-04-21 03:16 -------
#define asmlinkage __attribute__((regparm(0)))
#define prevent_tail_call(ret) __asm__ ("" : "=r" (ret) : "0" (ret))

As I recall, the kernel expects the stack to remain untouched
so that it can restore that portion of the stack into registers
for a return from a system call. I think it's even in the right
format to do an IRET instruction.

The i386 kernel has a 4044-byte stack, and the kernel tries to
be very fast (including cache-friendly), so copying data around
to satisfy the normal ABI is not desirable.

------- Comment #3 From Andrew Pinski 2006-04-21 03:19 -------
Did you know sibcalling reduces the stack usage in fact it is kinda stupid to
have a function call overhead to create a stack?

Therefor the asm should instead create the stack.

------- Comment #4 From Albert Cahalan 2006-04-21 03:41 -------
Mostly, sibcalling does not happen. So in the common case, setting up the
normal ABI-specified stack would be a waste.

------- Comment #5 From Andrew Pinski 2006-04-22 20:24 -------
Use -fno-sibcalling if you cannot use this optimization because you are
violating the ABI  really and since GCC depends on the ABI being followed,
there is almost nothing GCC can do.

If the kernel wants to do tricks like this, that is their own business and not
really GCC business to know the kernel is not following the ABI that has been
there since day one.

------- Comment #6 From Richard Guenther 2006-04-22 20:35 -------
So you want __attribute__((incoming_stack_frame_read_only)).

------- Comment #7 From Andreas Schwab 2006-04-22 21:09 -------
It's called __attribute__((syscall_linkage)) on ia64.

------- Comment #8 From Joshua 2006-04-22 23:28 -------
The code that calls all the "asmlinkage" calls does so through a vector table.
Consequently, it does not know how many arguments it calls. However, the
arguments live in particular registers when called, so it just pushes all of
them
(it has to preserve all the registers anyway).

------- Comment #9 From Albert Cahalan 2006-04-23 02:05 -------
Actually, there is no desire to prevent sibling calls. That's a hack. Sibling
calls are desirable as long as they don't muck with the incoming argument
stack.

Using -fno-sibcalling is definitely out of the question. It would affect other
functions.

Note: the kernel uses -mregparm=3 as the default.

Once indexed by Google, a query for "slightly different macro that talks less
about tailcalls" ought to find some more discussion of this. Google groups has
it already.

------- Comment #10 From Albert Cahalan 2006-04-23 02:09 -------
A couple quotes from Linus on the linux-kernel mailing list, in response to the
idea (expressed in comment #3 above) of having the assembly set up the normal
stack:

-- 1 --

Sure, we could just do a slower system call entry. We always knew that.

Suggesting that as the solution is pretty stupid, though. That's _the_
most timing-critical part in the whole kernel on many loads. We've
literally spent time trying to remove single cycles there, and it matters.

I'd much rather have an officially sanctioned way to do what Linux wants
to do, but in the meantime, we can (and are forced to) rely on hacks like
"prevent_tail_call()". They are hacks, and we _know_ they are hacks, but
as long as gcc maintainers don't want to help us, we don't have much
choice (although we could perhaps make the hacks a bit nicer ;).

The fact is, the "official ABI" simply isn't appropriate. In fact, we
don't use the "official ABI" _anywhere_ in the kernel any more, since we
actually end up using other gcc calling conventions (ie regparm=3).

Btw, this is not even unusual. A lot of embedded platforms have support
for magic exception/interrupt calling conventions. Gcc even supports them:
things like "__attribute__((interrupt))". This is also exactly analogous
to stdcall/cdecl/regparm/longcall/naked/sp_switch/trap_exit etc
attributes.

So gcc already has support for the fact that people sometimes need special
calling conventions. We've historically worked around it by hand instead,
since our calling convention is very _close_ to the standard one

In fact, the calling convention we want there is _so_ close to the
standard one, that I'm not even convinced the i386 ABI really says that
the callee owns the parameter space - it may well be a local gcc decision
rather than any "official i386 ABI" issue. 

-- 2 --

I'd much rather have a real gcc attribute. I don't _like_
having horrible hacks like the above to make gcc do what I want it to do.
And I know the gcc developers don't like it either when I force gcc to be
my biatch. It would be much better for everybody if gcc helped a bit.

------- Comment #11 From Andrew Pinski 2006-04-23 02:12 -------
A couple of points.  One GCC follows the ABI and with changes like regparm it
still follows a defined ABI.
There are a couple more cases than just sibcalling which will fail even now. 
Reload (and future RAs) uses where the incomming arguments are to spill for
those cases so in reality it is even harder to fix the problem.  So setting up
the stack is only real thing to do.

------- Comment #12 From Ian Lance Taylor 2006-05-01 04:51 -------
Can somebody please add a small standalone test case showing the problem here?

Thanks.

------- Comment #13 From Andrew Pinski 2006-05-01 05:58 -------
(In reply to comment #12)
> Can somebody please add a small standalone test case showing the problem here?

One is:
int g(int a, int b);

int f(int a, int b)
{
  g(a, b);
  return g(a, b);
}

------- Comment #14 From Ian Lance Taylor 2006-05-02 03:40 -------
Created an attachment (id=11354) [edit]
Possible patch

I've attached a possible patch for this issue.  It adds a new attribute
"preserve_stack" which tells the compiler that the stack should be preserved. 
For example, __attribute__ ((preserve_stack, regparm (0))).

It seems to work with pinskia's small test case.  Could somebody see whether it
works with the original test case?

Thanks.

------- Comment #15 From Jakub Jelinek 2008-04-10 17:40 -------
The i?86 Linux kernel just got bitten by this again, see
https://bugzilla.redhat.com/show_bug.cgi?id=427707
In this case, it wasn't about a tail call, but I believe high register pressure
that caused modification of the incoming stack slot, which is undesirable for
kernel asmlinkage functions.

The #c14 patch after small fixup (s/preserve-stack/preserve_stack/) fixes some
cases, but certainly all, on the other side pessimizes code even where it is
not strictly necessary (though it is unclear how hard would it be to improve
that).

On f1 already vanilla gcc doesn't modify the incoming stack area, and the patch
just pessimizes it by reading all arguments into registers and then storing
them 
into a different stack slot, eventhough d argument is actually never modified
and so could be used directly from the stack slot, avoiding one move from
memory and one move to memory.  f2 is an example where the patch doesn't help
at all,
and incoming stack area is modified even with the patch.  On f3 the patch
properly prevents doing tail call, as that would actually modify the incoming
stack area.  But on f4 there is no need to avoid the tail call.

I guess we could live with avoiding all tail calls, but a) need a fix for f2
b) for f1 I'd say at expand time for -O1+ we should know whether an argument
is TREE_ADDRESSABLE or not, and whether it is ever modified or not, which could
help us avoid unnecessary copying.  Still will need to make sure reloading
won't
push anything into the incoming registers area.

------- Comment #16 From Jakub Jelinek 2008-04-10 18:09 -------
From the kernel people it would be interesting to hear on which targets they
actually need it (e.g. if it is i?86 only, it would be better implemented as
i?86 specific syscall_linkage attribute).

------- Comment #17 From Jakub Jelinek 2008-04-10 18:13 -------
x86_64 actually isn't a problem, as it passes integral arguments in registers
(and kernel only uses at most 6 syscall arguments).  Structures aren't passed
by value.

------- Comment #18 From Jakub Jelinek 2008-04-11 10:25 -------
Seems I forgot to provide the testcase I was talking about.  Here it is:

__attribute__((preserve_stack)) void f1 (int a, int b, int c, int d, int e)
{
  int i;
  for (i = 0; i < 50; i++)
    {
      // Simulate high register pressure, I'm lazy
      asm volatile ("" : : "r" (e) : "%eax", "%ebx", "%ecx", "%edx", "%esi");
      e++;
      asm volatile ("" : : "r" (d) : "%eax", "%ebx", "%ecx", "%edx", "%esi");
    }
}

__attribute__((preserve_stack)) void f2 (int a, int b, int c, int d, int e)
{
  int i;
  for (i = 0; i < 50; i++)
    {
      asm volatile ("" : : "m" (e) : "%eax", "%ebx", "%ecx", "%edx", "%esi",
"%edi");
      e++;
    }
}

extern int fn (int, int);

__attribute__((preserve_stack)) int f3 (int a, int b)
{
  return fn (b, a);
}

__attribute__((preserve_stack)) int f4 (int a, int b)
{
  return fn (a, b);
}

------- Comment #19 From Jan Hubicka 2008-04-24 16:05 -------
I am going to make patch that will with preserve-stack attribute in addition to
inhibitting libcall also put REG_EQUIV notes on a copy instead of argument
register itself that should be sufficient to prevent reload from re-using the
slot.

I think we need to add a temporary register with REG_EQUIV note that will be
copypropagated if argument is read only (or to read only parts) so
rematerialization works.

Honza

------- Comment #20 From Jan Hubicka 2008-04-29 15:40 -------
Created an attachment (id=15546) [edit]
New patch

Hi,
this patch implements idea of having temporary read-only register with
REG_EQUIV note that will get propagated to place of the read-write parameter
register.
However I am still midly confused, when reload is actually using the operand
slot in read-write way?  I don't seem to be able to construct a testcase.
If reload is not doing that, perhaps just dropping the function.c part of patch
is easiest way around.

Honza

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug