While looking at PR42722 I noticed that gcc generates awful code for a tail-call involving a trivial pass-through of a large struct parameter. > cat bug1.c struct s1 { int x[16]; }; extern void g1(struct s1); void f1(struct s1 s1) { g1(s1); } struct s2 { int x[17]; }; extern void g2(struct s2); void f2(struct s2 s2) { g2(s2); } > gcc -O2 -fomit-frame-pointer -S bug1.c > cat bug1.s .file "bug1.c" .text .p2align 4,,15 .globl f1 .type f1, @function f1: subl $12, %esp addl $12, %esp jmp g1 .size f1, .-f1 .p2align 4,,15 .globl f2 .type f2, @function f2: subl $12, %esp movl $17, %ecx movl %edi, 8(%esp) leal 16(%esp), %edi movl %esi, 4(%esp) movl %edi, %esi rep movsl movl 4(%esp), %esi movl 8(%esp), %edi addl $12, %esp jmp g2 .size f2, .-f2 .ident "GCC: (GNU) 4.5.0 20100128 (experimental)" .section .note.GNU-stack,"",@progbits There are two problems with this code: 1. For the larger struct gcc generates a block copy with identical source and destination addresses, which amounts to a very slow NOP. 2. For the smaller struct gcc manages to eliminate the block copy, but it leaves pointless stack manipulation behind in the function (f1). However, gcc-4.3 generates no pointless stack manipulation: .globl f1 .type f1, @function f1: jmp g1 .size f1, .-f1 .ident "GCC: (GNU) 4.3.5 20100103 (prerelease)" so there's a code size and performance regression in 4.5/4.4.
See also the related PR28831
Confirmed. For the larger structure we expand the block-copy as (insn 8 7 9 2 t3.c:7 (parallel [ (set (reg:SI 60) (const_int 0 [0x0])) (set (reg:SI 58) (plus:SI (ashift:SI (reg:SI 60) (const_int 2 [0x2])) (reg:SI 58))) (set (reg:SI 59) (plus:SI (ashift:SI (reg:SI 60) (const_int 2 [0x2])) (reg:SI 59))) (set (mem:BLK (reg:SI 58) [0 S72 A32]) (mem/s/c:BLK (reg:SI 59) [4 s2+0 S72 A32])) (use (reg:SI 60)) ]) -1 (nil)) while for the smaller one we move it by pieces. Somehow the tailcalling only works with one variant. Didn't really work with 4.3 or any older release.
*** Bug 42910 has been marked as a duplicate of this bug. ***