Bug 42909 - inefficient code for trivial tail-call with large struct parameter
Summary: inefficient code for trivial tail-call with large struct parameter
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 42910 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-01-30 22:48 UTC by Mikael Pettersson
Modified: 2021-09-23 01:21 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*, aarch64-*-*
Build:
Known to work:
Known to fail: 2.95.3, 3.4.6, 4.1.2, 4.3.4, 4.4.2, 4.5.0
Last reconfirmed: 2021-08-07 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mikael Pettersson 2010-01-30 22:48:31 UTC
While looking at PR42722 I noticed that gcc generates awful code for a tail-call involving a trivial pass-through of a large struct parameter.

> cat bug1.c
struct s1 { int x[16]; };
extern void g1(struct s1);
void f1(struct s1 s1) { g1(s1); }

struct s2 { int x[17]; };
extern void g2(struct s2);
void f2(struct s2 s2) { g2(s2); }
> gcc -O2 -fomit-frame-pointer -S bug1.c
> cat bug1.s
        .file   "bug1.c"
        .text
        .p2align 4,,15
.globl f1
        .type   f1, @function
f1:
        subl    $12, %esp
        addl    $12, %esp
        jmp     g1
        .size   f1, .-f1
        .p2align 4,,15
.globl f2
        .type   f2, @function
f2:
        subl    $12, %esp
        movl    $17, %ecx
        movl    %edi, 8(%esp)
        leal    16(%esp), %edi
        movl    %esi, 4(%esp)
        movl    %edi, %esi
        rep movsl
        movl    4(%esp), %esi
        movl    8(%esp), %edi
        addl    $12, %esp
        jmp     g2
        .size   f2, .-f2
        .ident  "GCC: (GNU) 4.5.0 20100128 (experimental)"
        .section        .note.GNU-stack,"",@progbits

There are two problems with this code:
1. For the larger struct gcc generates a block copy with identical source and destination addresses, which amounts to a very slow NOP.
2. For the smaller struct gcc manages to eliminate the block copy, but it leaves pointless stack manipulation behind in the function (f1). However, gcc-4.3 generates no pointless stack manipulation:

.globl f1
        .type   f1, @function
f1:
        jmp     g1
        .size   f1, .-f1
        .ident  "GCC: (GNU) 4.3.5 20100103 (prerelease)"

so there's a code size and performance regression in 4.5/4.4.
Comment 1 Richard Biener 2010-01-30 23:34:17 UTC
See also the related PR28831
Comment 2 Richard Biener 2010-01-30 23:40:22 UTC
Confirmed.

For the larger structure we expand the block-copy as

(insn 8 7 9 2 t3.c:7 (parallel [
            (set (reg:SI 60)
                (const_int 0 [0x0]))
            (set (reg:SI 58)
                (plus:SI (ashift:SI (reg:SI 60)
                        (const_int 2 [0x2]))
                    (reg:SI 58)))
            (set (reg:SI 59)
                (plus:SI (ashift:SI (reg:SI 60)
                        (const_int 2 [0x2]))
                    (reg:SI 59)))
            (set (mem:BLK (reg:SI 58) [0 S72 A32])
                (mem/s/c:BLK (reg:SI 59) [4 s2+0 S72 A32]))
            (use (reg:SI 60))
        ]) -1 (nil))

while for the smaller one we move it by pieces.  Somehow the tailcalling
only works with one variant.

Didn't really work with 4.3 or any older release.
Comment 3 Andreas Schwab 2010-05-03 20:26:06 UTC
*** Bug 42910 has been marked as a duplicate of this bug. ***