Tail call optimization on function with many arguments

Jeff Law law@redhat.com
Tue Aug 18 20:58:00 GMT 2015


On 08/18/2015 11:30 AM, Lionel Villard wrote:
> Hi,
>
> consider this small C program:
>
>    #include <stdlib.h>
>    int tailoptimized(int c1, int c2, int c3, int c4, int c5, int c6)
>   {
>      if (c1 > c5)
>          {
>              if (c4 < 5)
>              {
>                  return c3 + c2;
>              }
>          }
>          else if (c5 > 45)
>                  return c6;
>          return c3 + c4 + c5 + c6;
>    }
>
>    int nottailoptimized(int c1, int c2, int c3, int c4, int c5, int c6, int
> c7)
>    {
>        if (c2 > c5)
>        {
>            if (c2 < 5)
>           {
>                return c3 + c2;
>           }
>        }
>        else if (c1 > 45)
>              return c6;
>        return c1 + c4 + c5 + c6;
>   }
>
>    int call(int count)
>    {
>        if (count > 100)
>      {
>          return tailoptimized(count *2, count, count * 3, count, count,
> count);
>      }
>      return nottailoptimized(count, count* 2, count, count, count, count,
> count * 4);
>    }
>
>    int main (int argc, char* argv[])
>    {
>        call(atoi(argv[0]));
>    }
>
> I used this command to compile it:
>
>      gcc -O1 -foptimize-sibling-calls -S -c test.c
>
> and generated assembly code for the 'call' function is:
>
> call:
> .LFB17:
>      .cfi_startproc
>      cmpl    $100, %edi
>      jle    .L12
>      leal    (%rdi,%rdi), %eax
>      leal    (%rax,%rdi), %edx
>      movl    %edi, %r9d
>      movl    %edi, %r8d
>      movl    %edi, %ecx
>      movl    %edi, %esi
>      movl    %eax, %edi
>      jmp    tailoptimized
> .L12:
>      subq    $8, %rsp
>      .cfi_def_cfa_offset 16
>      leal    (%rdi,%rdi), %esi
>      leal    0(,%rdi,4), %eax
>      movl    %eax, (%rsp)
>      movl    %edi, %r9d
>      movl    %edi, %r8d
>      movl    %edi, %ecx
>      movl    %edi, %edx
>      call    nottailoptimized
>      addq    $8, %rsp
>      .cfi_def_cfa_offset 8
>      ret
>      .cfi_endproc
>
> As you can see, when a function takes more than 6 arguments
> (nottailoptimized in this case), TCO is turned off.
>
> What's the reason for this limitation (number of registers?)? Is there a
> flag/option that would allow more than 6 arguments?
There's no inherent limit on the number of arguments.  It's constrained 
more by the need to avoid stack allocation in the caller.

In your example "call" must allocate a stack slot for the last arugment 
to "nottailoptimized".  The x86_64 ABI mandates that "call" would also 
be responsible for deallocating that stack slot.  That obviously can't 
happen if the call to "nottailoptimized" was turned into a tail call.

"tailoptimized" doesn't suffer from this problem because it does not 
need any of its arguments passed in the stack.


In theory, the tail call optimization could be enhanced to create a 
clone of "nottailoptimized" which would de-allocate the stack for any 
arguments and the tail call optimization could jump to that special 
copy.  Nobody's implemented that.  It's not even clear if it'd be a good 
idea or not.

Jeff



More information about the Gcc-help mailing list