optimization problem: ptr not kept in register

Peter A. Felvegi petschy@praire-chicken.com
Wed Mar 26 09:50:00 GMT 2014


On 03/26/2014 12:32 AM, Ian Lance Taylor wrote:
> On Tue, Mar 25, 2014 at 12:38 PM, Peter A. Felvegi
> <petschy@praire-chicken.com> wrote:
>> The reduced test case is at the end. It encodes data into a buffer in a loop
>> with variable length encoding (not a working real encoding). For some
>> reason, the write ptr is not kept in a register, but loaded/stored when
>> used/updated. There is a potential function call in the loop, but there are
>> __builtin_expect hints, so I think it would be possible to use a register
>> for the ptr and store just before the call, and load it back right after the
>> call. This would speed up the common code path: less code, less loads and
>> stores.I measured around 20-30% more runtime, compared to a version where a
>> pointer goes in and the updated ptr is returned. However, passing/returning
>> the ptr has other issues, esp for a decoder, that would return the decoded
>> value normally, not the ptr.
> You marked the encode_noinline function as noinline, and encode can
> call encode_noinline.  The encode_noinline function could change any
> part of global memory, and in particular could change the value of
> n->next.  So the loop has to reload that value, in case it was
> changed.
I think you misunderstood. The node ptr is in %rbx, which is callee 
saved. After writing the data to cur ptr at +38 (mov %esi, (%rax)), 
n=n->next is performed at +40: mov    0x8(%rbx),%rbx. This is as simple 
as it gets, there are no reloads.

My point was that the out buffer's cur ptr gets loaded/stored all the 
time, even stored more than once in succession on certain paths. Yes, 
encode_noinline() could, and actually, will modify the cur ptr. But that 
call is on a marked unlikely path, while the likely path doesn't contain 
any calls, so could work entirely with registers. The loading/storing of 
cur on the likely path is a pessimization that affects performance.

I hope this clarifies it. Is it then an optimizer issue?

Regards, Peter



More information about the Gcc-help mailing list