Strange thing with static Array!

Sat Feb 28 21:42:00 GMT 2009

It makes more sense to look at the assemble language generated with some 
optimization turned on.  I used -O3.  The results are shockingly bad, 
but a tiny bit better than what you got.
        call    _ZNSirsERi
        movl    -28(%rbp), %ecx
        movslq  %ecx,%rax
        leaq    30(,%rax,4), %rax
        andq    $-16, %rax
        subq    %rax, %rsp
        leaq    15(%rsp), %r12
        andq    $-16, %r12

The basic task is to convert the value from 32 bit to 64 bit, then 
multiply by four, then round up to a multiple of 16, then subtract that 
from rsp and use it as the address of the array.

1) Converting a signed number from 32-bit to 64 bit is harder than 
unsigned.  The compiler isn't smart enough to realize that if the value 
were negative the result would crash anyway, so the compiler uses the 
harder signed conversion process (movslq or cltq).

2) The salq $2 in your example is the multiply by four.  I'm not sure 
what the sub and add of 1 are for, but certainly not alignment.

3) To round UP to a multiple of 16, you can add 15 then round down to a 
multiple of 16.  Both versions seem to think they must round twice, 
aparently satisfying alignment requirements on both the resulting rsp 
value and the allocated array address.

Actually rounding just once is plenty to align both the stack and the 
allocation.  It also might be faster to round the address down rather 
than round the length up (I'm not sure).

The andq $-16 is the faster way to round down to a multiple of 16.  The 
shrq $4 followed by salq $$ is a slower way.

The leaq 30(,%rax,4) multiplies by 4 and adds 30.  It is nice attention 
to detail for the compiler to merge that together, but rather lame to 
waste another leaq and andq rerounding the rounded result.

Bob Plantz wrote:
> On Sat, 2009-02-28 at 12:06 -0500, me22 wrote:
>
> You can see what the compiler is doing for you if you look at the
> assembly language. Here is the part where the array gets allocated on
> the stack (with my comments added):
> 	call	_ZNSirsERi         # cin >> array_size
> 	movl	-12(%rbp), %eax    # load array_size
> 	cltq                       # convert long to quad
> 	subq	$1, %rax           # make sure the new stack
> 	addq	$1, %rax           #   pointer meets all the
> 	salq	$2, %rax           #   alignment specs.
> 	addq	$15, %rax
> 	addq	$15, %rax
> 	shrq	$4, %rax
> 	salq	$4, %rax
> 	subq	%rax, %rsp         # allocate the array
> 	movq	%rsp, -48(%rbp)    # and save pointer to it
>
> I did this on an x86-64 system in 64-bit mode, and I did not worry
> through the alignment code to see exactly what's going on. In
> particular,
>        subq $1, %rax
>        addq $1, %rax
> is pretty weird. But the real point is where the array gets allocated on
> the stack.
>
> - Bob
>
>
>
>