Bug 91515 - missed optimization: no tailcall for types of class MEMORY
Summary: missed optimization: no tailcall for types of class MEMORY
Status: RESOLVED DUPLICATE of bug 71761
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 9.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 71761
Blocks:
  Show dependency treegraph
 
Reported: 2019-08-21 20:43 UTC by ead
Modified: 2021-08-10 23:23 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-01-29 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ead 2019-08-21 20:43:29 UTC
Produced assembler (-O2) for

   struct Vec3{
    double x, y, z;
   };

   struct Vec3 create(void);

   struct Vec3 use(){
    return create();
   }

looks as follows (live: https://godbolt.org/z/v-HjX0):

    use:
        pushq   %r12
        movq    %rdi, %r12
        call    create
        movq    %r12, %rax
        popq    %r12
        ret

Hower, I think that under System V AMD64 - ABI, the tailcall optimization:

    use:
        jmp    create

as create will move  %rdi-value to %rax anyway.
Comment 1 Peter Cordes 2019-08-28 03:48:22 UTC
The real missed optimization is that GCC is returning its own incoming arg instead of returning the copy of it that create() will return in RAX.

This is what blocks tailcall optimization; it doesn't "trust" the callee to return what it's passing as RDI.

See https://stackoverflow.com/a/57597039/224132 for my analysis (the OP asked the same thing on SO before reporting this, but forgot to link it in the bug report.)

The RAX return value tends to rarely be used, but probably it should be; it's less likely to have just been reloaded recently.

RAX is more likely to be ready sooner than R12 for out-of-order exec.  Either reloaded earlier (still in the callee somewhere if it's complex and/or non-leaf) or never spilled/reloaded.

So we're not even gaining a benefit from saving/restoring R12 to hold our incoming RDI.  Thus it's not worth the extra cost (in code-size and instructions executed), IMO.  Trust the callee to return the pointer in RAX.
Comment 2 Konstantin Kharlamov 2019-11-15 15:17:42 UTC
I think this is a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71761
Comment 3 Andrew Pinski 2021-08-10 23:23:00 UTC
Yes this is a dup of bug 71761.

*** This bug has been marked as a duplicate of bug 71761 ***