This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/86028] New: Dead stores created by va_start/va_arg are not fully cleaned up

From: "zackw at panix dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Fri, 01 Jun 2018 16:19:15 +0000
Subject: [Bug rtl-optimization/86028] New: Dead stores created by va_start/va_arg are not fully cleaned up
Auto-submitted: auto-generated

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86028

            Bug ID: 86028
           Summary: Dead stores created by va_start/va_arg are not fully
                    cleaned up
           Product: gcc
           Version: 8.1.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zackw at panix dot com
  Target Milestone: ---

On any ABI where arguments to a variadic function are passed in the same places
that they would be if they were arguments to a non-variadic function, it should
be possible to optimize 'foo_wrapper' in the following test program all the way
down to a tail-call to 'foo' and nothing else:

#include <stdarg.h>

extern int a;
extern int b;
extern void *c;

int __attribute__((noinline))
foo(int x, int y, void *z)
{
  a = x;
  b = y;
  c = z;
  return 0;
}

int
foo_wrapper(int x, int y, ...)
{
  va_list ap;
  void *z;
  va_start(ap, y);
  z = va_arg(ap, void *);
  va_end(ap);
  return foo(x, y, z);
}

('foo' is included in this test program so that one can easily verify that no
argument shuffling is needed.)

gcc-8.1 targeting x86-64-linux, x86-32-linux, or aarch64-linux (all of which
meet the above ABI requirement) does not manage to do this.  It actually does
the best job for x86-32, where everything is on the stack:

foo_wrapper:
        pushl   12(%esp)
        pushl   12(%esp)
        pushl   12(%esp)
        call    foo
        addl    $12, %esp
        ret

This is literally duplicating 'foo_wrapper's incoming arguments into a new
frame in order to call 'foo'.  The instructions are unnecessary, but they are
not dead in the formal sense.   Perhaps the issue here is just that variadic
functions aren't being considered for sibcall optimization?

For the targets where arguments are passed in registers, the code generation is
worse, e.g. aarch64:

foo_wrapper:
        stp     x29, x30, [sp, -64]!
        add     x3, sp, 48
        add     x4, sp, 64
        mov     x29, sp
        stp     x4, x4, [sp, 16]
        str     x3, [sp, 32]
        stp     wzr, wzr, [sp, 40]
        str     x2, [sp, 56]
        bl      foo
        ldp     x29, x30, [sp], 64
        ret

The actual arguments to foo_wrapper are in x0, x1, and x2, and that's also
where foo wants them, and they aren't touched at all.  All of the computation
done here is dead.

(I noticed this while messing around with glibc's syscall wrappers, which
really do things like this.  'foo_wrapper' has the type signature of 'fcntl',
for instance.)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]