Bug 86050 - Inline break tail-call optimization
Summary: Inline break tail-call optimization
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 6.3.0
: P3 normal
Target Milestone: 10.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-06-04 20:06 UTC by Aso Renji
Modified: 2021-08-10 18:49 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-07-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aso Renji 2018-06-04 20:06:12 UTC
I have deep tail-recursion and try check how many bytes it consumed. Algorithm for this very simple - pointer to variable in main function minus pointer to variable in current recursive call. If function for this check have noinline attribute, tail-call optimization doing well and my recursion consume very little amount of memory. But if I remove noinline attribute, optimization don't work and I get stack overflow.

OS - Debian Stretch.
g++ --version - g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Command line - g++ -O2 main.cpp -o main
Code:

#include <iostream>

using namespace std;

struct Parent
{
    virtual void tailCall(const char*,long )const{}
};

struct Child:Parent
{
    //remove noinline and you get stack overflow
    static __attribute__((noinline)) void stackConsume(const char*stack){
        char value;
        std::cout<<"Consumed "<<stack-&value<<" bytes"<<std::endl;
    }

    void tailCall(const char *stack, long deep) const
    {
        if(deep==1000000000000L)
            return;
        stackConsume(stack);
        next->tailCall(stack,++deep);
    }
    Parent*next=this;
};

int main()
{
    Parent*parent=new Child;
    char stack;
    parent->tailCall(&stack,0);
    return 0;
}
Comment 1 Marc Glisse 2018-06-04 21:50:36 UTC
Indeed. The testcase is easier to analyze if you replace iostreams with a plain printf. The key element seems to be taking the address of a local variable.
Comment 2 Andrew Pinski 2018-06-04 21:55:55 UTC
Related to PR 54770.
Comment 3 Richard Biener 2018-06-05 08:46:29 UTC
Confirmed.  It is because the address of 'value' escapes which is easier to
see when you instead do

    static void __attribute__((noinline)) stackConsume(const char*stack){
        char value;
        __builtin_putchar ((long)&value);
    }

this then causes us to hit

  /* Make sure the tail invocation of this function does not indirectly
     refer to local variables.  (Passing variables directly by value
     is OK.)  */
  FOR_EACH_LOCAL_DECL (cfun, idx, var)
    {
      if (TREE_CODE (var) != PARM_DECL
          && auto_var_in_fn_p (var, cfun->decl)
          && may_be_aliased (var)
          && (ref_maybe_used_by_stmt_p (call, var)
              || call_may_clobber_ref_p (call, var)))
        return;
    }

where we think that the recursive call might possibly access that very
escaped stack slot (for example via a global pointer).  We do not have
sophisticated enough analysis to fix that easily.
Comment 4 Konstantin Kharlamov 2019-07-23 12:42:33 UTC
Works for me on gcc 9.1.0. I compile it as:

    $ g++ test.cpp -o a -O2

And then running `./a` results in a bunch of:

    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
    Consumed 80 bytes
Comment 5 Konstantin Kharlamov 2019-07-23 12:45:34 UTC
(In reply to Konstantin Kharlamov from comment #4)
> Works for me on gcc 9.1.0. I compile it as:
> 
>     $ g++ test.cpp -o a -O2
> 
> And then running `./a` results in a bunch of:
> 
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes
>     Consumed 80 bytes

Just tested with 8.3.0 version on the other PC, same there, i.e. stack space does not increase when built with -O2. So this was fixed at least since 8.3.0.
Comment 6 Aso Renji 2019-07-23 13:24:04 UTC
(In reply to Konstantin Kharlamov from comment #5)
> Just tested with 8.3.0 version on the other PC, same there, i.e. stack space
> does not increase when built with -O2. So this was fixed at least since
> 8.3.0.

Tested _after_ remove __attribute__((noinline)) from code?
g++ --version g++ (Debian 8.3.0-6) 8.3.0 bug still present for me.
Change "static __attribute__((noinline)) void" to "static void" and:

Consumed 8384063 bytes
Consumed 8384127 bytes
Consumed 8384191 bytes
Consumed 8384255 bytes
Consumed 8384319 bytes
Consumed 8384383 bytes
Consumed 8384447 bytes
Ошибка сегментирования
Comment 7 Konstantin Kharlamov 2019-07-23 15:38:51 UTC
(In reply to Aso Renji from comment #6)
> (In reply to Konstantin Kharlamov from comment #5)
> > Just tested with 8.3.0 version on the other PC, same there, i.e. stack space
> > does not increase when built with -O2. So this was fixed at least since
> > 8.3.0.
> 
> Tested _after_ remove __attribute__((noinline)) from code?
> g++ --version g++ (Debian 8.3.0-6) 8.3.0 bug still present for me.
> Change "static __attribute__((noinline)) void" to "static void" and:
> 
> Consumed 8384063 bytes
> Consumed 8384127 bytes
> Consumed 8384191 bytes
> Consumed 8384255 bytes
> Consumed 8384319 bytes
> Consumed 8384383 bytes
> Consumed 8384447 bytes
> Ошибка сегментирования

Oh, sorry, I didn't notice the testcase requires modification to start crashing. Yeah, it crashes with 9.1.0 too then.
Comment 8 Jonathan Wakely 2019-07-23 16:18:22 UTC
(In reply to Konstantin Kharlamov from comment #7)
> Oh, sorry, I didn't notice the testcase requires modification to start
> crashing. Yeah, it crashes with 9.1.0 too then.

It's unhelpful to post a reproducer for a bug that doesn't reproduce the bug. The code provided should demonstrate the bug, without requiring changes.
Comment 9 Andrew Pinski 2021-08-10 18:49:15 UTC
In the case of removing the noinline, GCC 10+ is able to tail call this function just fine and we don't get an overall stack increase.