Optimisation puzzle

Lawrence Crowl crowl@google.com
Tue Mar 20 07:51:00 GMT 2007


On 3/17/07, Erik <sigra@home.se> wrote:
> Lawrence Crowl skrev:
> > On 3/16/07, Erik <sigra@home.se> wrote:
> >> Ian Lance Taylor skrev:
> >> > Erik <sigra@home.se> writes:
> >> >
> >> >
> >> >>  From man:puts I see that it is declared "int puts(const char
> >> >> *)". This means that puts does not promise to leave its argument
> >> >> unchanged. Therefore the caller must push the argument anew before
> >> >> each call. If it had been declared "int puts(const char * const)"
> >> >> instead, the push should be moved outside the loop. Unfortunately
> >> this
> >> >> does not seem to work. I tried with the following program:
> >> >> void q(const unsigned int);
> >> >> void f() {for (unsigned int x = 0; x != 10; x++) q(77);}
> >> >>
> >> >> and built it with "gcc -std=c99 -Os -Wall -Wextra -Werror -S":
> >> >> .L2:
> >> >>         subl    $12, %esp
> >> >>         incl    %ebx
> >> >>         pushl   $77
> >> >>         call    q
> >> >>         addl    $16, %esp
> >> >>         cmpl    $10, %ebx
> >> >>         jne     .L2
> >> >>
> >> >> As you can see, "pushl   $77" is still inside the loop even though q
> >> >> promises to not change its argument. This must be a bug.
> >> >
> >> > This is not a bug.  const on an automatic variable in C is more
> >> > advisory than anything else.  You are not permitted to change a const
> >> > object, but you can cast its address to a non-const pointer.
> >
> > The "lost optimization" in this case has nothing to do with const
> > versus non-const.  The issue is that the call deallocates the
> > parameters. The pushl is allocating the argument.
> OK, as I understand it, the pushl $77 does 2 things:
> 1. Copy the value 77.
> 2. Allocate space for it (by changing the stack pointer).
>
> And in C, allocating the space must be done in each iteration because
> the called function deallocates it. But copying the value 77 in each
> iteration could be optimized away.

Well actually the caller deallocates it, but on the other side of the
basic-block boundary.  Look above for the adjustments to the stack
pointer.

> But what about Ada? Does it have the same calling convention?
> See the Ada loop:
> .L5:
>         pushl   $77
> .LCFI3:
>         call    _ada_q
>         popl    %eax
>         decl    %ebx
>         jns     .L5

I don't know the Ada ABI, but this code indicates that it does follow
the same convention.  It would be a bit surprising if it did not because
that would make leveraging the system more difficult.

> It looks like in Ada, the caller deallocates the parameter (popl). If
> this is so, it would mean that both the push and the pop should be moved
> out of the loop, something like:
>         pushl   $77
> .L5:
> .LCFI3:
>         call    _ada_q
>         decl    %ebx
>         jns     .L5
>         popl    %eax
>
> This would cut the loop down to only 3 instructions. Not bad compared to
> the 7 instructions in the original C loop. But the C version should
> probably be counted as having 8 instructions to make the comparison
> accurate, since it has a hidden popl inside q, which is not in Q of the
> Ada version. Is this correct?

Well, close.  Theoretically, the optimization can be done, at least in
C++.  (I'm not entirely familiar with the C rules here.)  The problem
is that changing the details around calls takes extra care if you want
the tool (like a debugger) to be able to follow along.  In this case, I
think you would need to show that the savings is a significant fraction
of the total cost.  You might be able to do that for static code size,
but you'll have a harder time for the dynamic instruction count,
which is what most compilers optimize.

-- 
Lawrence Crowl



More information about the Gcc-help mailing list