Optimisation puzzle

Erik sigra@home.se
Fri Mar 16 12:39:00 GMT 2007


Nate Denmark skrev:
> I have a question about the optimiser in GCC. Take this bit of code:
>
>        for(x = 0; x != 10; x++)
>                puts("Hello");
>
> When compiled with full optimisations (-O3, -fexpensive-optimizations
> etc.) it generates the following loop in assembly:
>
> .L2:
>        incl    %ebx
>        movl    $.LC0, (%esp)
>        call    puts
>        cmpl    $10, %ebx
>        jne     .L2
>
> .LC0 points to the "Hello" string. I'm wondering why GCC puts that
> 'movl' line inside the loop, so that it's executed each time, when it
> could go before the loop? As I understand it, 'puts' shouldn't do
> anything to the stack, and the return value is passed back in eax, so
> I'm not sure why it's doing the 'movl' each time. If I force loop
> unrolling the same thing happens -- the 'movl' each iteration.
 From man:puts I see that it is declared "int puts(const char *)". This 
means that puts does not promise to leave its argument unchanged. 
Therefore the caller must push the argument anew before each call. If it 
had been declared "int puts(const char * const)" instead, the push 
should be moved outside the loop. Unfortunately this does not seem to 
work. I tried with the following program:
void q(const unsigned int);
void f() {for (unsigned int x = 0; x != 10; x++) q(77);}

and built it with "gcc -std=c99 -Os -Wall -Wextra -Werror -S":
.L2:
        subl    $12, %esp
        incl    %ebx
        pushl   $77
        call    q
        addl    $16, %esp
        cmpl    $10, %ebx
        jne     .L2

As you can see, "pushl   $77" is still inside the loop even though q 
promises to not change its argument. This must be a bug. I tried to 
reproduce the bug with Ada and although the loop generated from Ada 
source code is indeed 2 instruction shorter (5 instructions instead of 7 
instructions for C), the bug seems to be there too. I have this program 
in f.adb:
with Q;
procedure F is begin for i in 1 .. 10 loop Q(77); end loop; end F;

and q.ads:
procedure Q(N : in Natural);

and built with "gnatgcc -Os -Wall -Wextra -Wextra -Werror -S f.adb":
.L5:
        pushl   $77
.LCFI3:
        call    _ada_q
        popl    %eax
        decl    %ebx
        jns     .L5

So if you need this loop to be tight, write it in Ada. But something, 
probably GCC, still needs to be fixed to get the "pushl $77" out of the 
loop (for any language). Unless someone explains that I have misses 
something, I will report this as a bug to gcc.

Note 1: Tested with gcc 4.1.1 (Gentoo 4.1.1-r3) and gnatgcc 3.4.5 (from 
Gentoo dev-lang/gnat-3.45).

Note 2: If you change the loop in C to "for (unsigned int x = 10; x; 
--x)" you will get it down to 6 instructions. Still not as tight as the 
Ada version, but one instruction less than your version. That gcc does 
not optimize your version to the 6 instruction version seems to be 
another bug. (I will report that too unless someone objects.)



More information about the Gcc-help mailing list