optimization problem: ptr not kept in register

Peter A. Felvegi petschy@praire-chicken.com
Thu Mar 27 12:37:00 GMT 2014


On 03/26/2014 06:48 PM, Ian Lance Taylor wrote:

>> My point was that the out buffer's cur ptr gets loaded/stored all the time,
>> even stored more than once in succession on certain paths. Yes,
>> encode_noinline() could, and actually, will modify the cur ptr. But that
>> call is on a marked unlikely path, while the likely path doesn't contain any
>> calls, so could work entirely with registers. The loading/storing of cur on
>> the likely path is a pessimization that affects performance.
>>
>> I hope this clarifies it. Is it then an optimizer issue?
> I see what you mean.  You want the compiler to pull the value out of
> memory for the likely loop and then store it back into memory for the
> unlikely case.  That seems possible.  My first thought is that that
> would be a moderately costly optimization that would very rarely pay
> off, but I could be wrong.
Thanks, that might be part of it, but it seems that something else is at 
play here. To test the theory that the function call sabotages moving 
the cur ptr to a register, I commented out the noinline attribute at 
line 13. before encode_noinline(). There are no function calls, but now 
I'm really puzzled:

Dump of assembler code for function encode_node_list(OutBuf&, Node*):
    0x0000000000400600 <+0>:    test   %rsi,%rsi
    0x0000000000400603 <+3>:    je     0x40062f 
<encode_node_list(OutBuf&, Node*)+47>
// load outbuf's cur ptr
    0x0000000000400605 <+5>:    mov    (%rdi),%rax
    0x0000000000400608 <+8>:    jmp    0x400613 
<encode_node_list(OutBuf&, Node*)+19>
    0x000000000040060a <+10>:    nopw   0x0(%rax,%rax,1)
    0x0000000000400610 <+16>:    mov    %rcx,%rax
// load the data
    0x0000000000400613 <+19>:    mov    (%rsi),%edx
// calc next cur
    0x0000000000400615 <+21>:    lea    0x4(%rax),%rcx
// store!
    0x0000000000400619 <+25>:    mov    %rcx,(%rdi)
    0x000000000040061c <+28>:    cmp    $0xff,%edx
    0x0000000000400622 <+34>:    jg     0x400631 
<encode_node_list(OutBuf&, Node*)+49>
    0x0000000000400624 <+36>:    mov    %edx,(%rax)
    0x0000000000400626 <+38>:    mov    0x8(%rsi),%rsi
    0x000000000040062a <+42>:    test   %rsi,%rsi
    0x000000000040062d <+45>:    jne    0x400610 
<encode_node_list(OutBuf&, Node*)+16>
    0x000000000040062f <+47>:    repz retq
    0x0000000000400631 <+49>:    lea    0x8(%rax),%rcx
    0x0000000000400635 <+53>:    cmp    $0xffff,%edx
    0x000000000040063b <+59>:    movl   $0x0,(%rax)
// store again!
    0x0000000000400641 <+65>:    mov    %rcx,(%rdi)
    0x0000000000400644 <+68>:    jg     0x40064b 
<encode_node_list(OutBuf&, Node*)+75>
    0x0000000000400646 <+70>:    mov    %edx,0x4(%rax)
    0x0000000000400649 <+73>:    jmp    0x400626 
<encode_node_list(OutBuf&, Node*)+38>
// from here: the code of encode_noinline()
    0x000000000040064b <+75>:    cmp    $0xffffff,%edx
    0x0000000000400651 <+81>:    movl   $0x0,0x4(%rax)
    0x0000000000400658 <+88>:    jg     0x400666 
<encode_node_list(OutBuf&, Node*)+102>
    0x000000000040065a <+90>:    lea    0xc(%rax),%rcx
// and store again!
    0x000000000040065e <+94>:    mov    %rcx,(%rdi)
    0x0000000000400661 <+97>:    mov    %edx,0x8(%rax)
    0x0000000000400664 <+100>:    jmp    0x400626 
<encode_node_list(OutBuf&, Node*)+38>
    0x0000000000400666 <+102>:    lea    0x10(%rax),%rcx
    0x000000000040066a <+106>:    movl   $0x0,0x8(%rax)
// and again!
    0x0000000000400671 <+113>:    mov    %rcx,(%rdi)
    0x0000000000400674 <+116>:    mov    %edx,0xc(%rax)
    0x0000000000400677 <+119>:    jmp    0x400626 
<encode_node_list(OutBuf&, Node*)+38>

I naively thought, that if everyhing is inlined, and for code so simple, 
the ptr will be kept in a register all the time: loaded once at the 
beginning, stored once at the end. What is going on?

I thought about aliasing rules, too. I deliberately chose int* instead 
of char*, because in the latter case, the rules say thay writing to a 
char* invalidates everything. But for an int*, writing an int to the 
memory can't invalidate the pointer itself, because they are different 
types. The strict aliasing rules say, if I'm not mistaken, that if I 
write to a pointer, that will invalidate all values read from pointers 
pointing to the same type and live in registers, so they must be 
reloaded, unless the read or written pointer defined as __restrict, 
which means the pointer isn't aliasing other pointers (of the same 
type). Am I right?

If I compile w/ -fno-strict-aliasing, then the cur ptr will be reloaded 
each time after a 0 write was performed, as expected. Interestingly, the 
code is shorter than above by 17 bytes.

So with strict aliasing, the unnecessary loads are eliminated, but why 
are there unnecessary stores?

Thanks, Peter



More information about the Gcc-help mailing list