This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/86028] New: Dead stores created by va_start/va_arg are not fully cleaned up
- From: "zackw at panix dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 01 Jun 2018 16:19:15 +0000
- Subject: [Bug rtl-optimization/86028] New: Dead stores created by va_start/va_arg are not fully cleaned up
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86028
Bug ID: 86028
Summary: Dead stores created by va_start/va_arg are not fully
cleaned up
Product: gcc
Version: 8.1.0
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: zackw at panix dot com
Target Milestone: ---
On any ABI where arguments to a variadic function are passed in the same places
that they would be if they were arguments to a non-variadic function, it should
be possible to optimize 'foo_wrapper' in the following test program all the way
down to a tail-call to 'foo' and nothing else:
#include <stdarg.h>
extern int a;
extern int b;
extern void *c;
int __attribute__((noinline))
foo(int x, int y, void *z)
{
a = x;
b = y;
c = z;
return 0;
}
int
foo_wrapper(int x, int y, ...)
{
va_list ap;
void *z;
va_start(ap, y);
z = va_arg(ap, void *);
va_end(ap);
return foo(x, y, z);
}
('foo' is included in this test program so that one can easily verify that no
argument shuffling is needed.)
gcc-8.1 targeting x86-64-linux, x86-32-linux, or aarch64-linux (all of which
meet the above ABI requirement) does not manage to do this. It actually does
the best job for x86-32, where everything is on the stack:
foo_wrapper:
pushl 12(%esp)
pushl 12(%esp)
pushl 12(%esp)
call foo
addl $12, %esp
ret
This is literally duplicating 'foo_wrapper's incoming arguments into a new
frame in order to call 'foo'. The instructions are unnecessary, but they are
not dead in the formal sense. Perhaps the issue here is just that variadic
functions aren't being considered for sibcall optimization?
For the targets where arguments are passed in registers, the code generation is
worse, e.g. aarch64:
foo_wrapper:
stp x29, x30, [sp, -64]!
add x3, sp, 48
add x4, sp, 64
mov x29, sp
stp x4, x4, [sp, 16]
str x3, [sp, 32]
stp wzr, wzr, [sp, 40]
str x2, [sp, 56]
bl foo
ldp x29, x30, [sp], 64
ret
The actual arguments to foo_wrapper are in x0, x1, and x2, and that's also
where foo wants them, and they aren't touched at all. All of the computation
done here is dead.
(I noticed this while messing around with glibc's syscall wrappers, which
really do things like this. 'foo_wrapper' has the type signature of 'fcntl',
for instance.)