This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance problem


I fixed the problem (I think) with rdtsc on 64bit architectures.
http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz



2008/9/22 Łukasz Lew <lukasz.lew@gmail.com>:
> Indeed I never tested my asm code on 64 bit I knew it was buggy but I
> forgot about it.
> I will try to fix it now.
>
> You may be very right about the register allocation.
> I tuned my code on 4.2 and small "irrelevant" changes changed the
> perfomance badly
> and asm output revealed among other things different registers.
>
> Is there any way to controll register allocation just as
> allways_inline controls inlining?
>
> I use Oprofile a lot, and tried to pinpoint the difference but asm
> output is too different
> while c++ annotation  is too weak because of heavy inlining.
> Lukasz
>
> On Mon, Sep 22, 2008 at 01:39, John Fine <johnsfine@verizon.net> wrote:
>> I was curious, so I tried running your benchmark.  It was too fast for
>> meaningful results, so I increased the counts int the calls to
>> simple_playout_benchmark::run and I noticed some negative and generally
>> unstable values for "clock cycles per playout".
>>
>> So your code:
>>
>>  uint64 get_cc_time () volatile {
>>   uint64 ret;
>>   __asm__ __volatile__("rdtsc" : "=A" (ret) : :);
>>   return ret;
>>  }
>>
>> gives me values that aren't even monotonic.
>>
>> I'm on a 64-bit dual core AMD system.  My best guess is that the program
>> switches cores part way through the loop. But I really don't know enough
>> about either rdtsc or __asm__ __volatile__ to know whether there might be
>> other reasons.
>>
>> Are you running on a single core system?  Or otherwise controlling for such
>> effects?
>>
>> In other projects, I've found that Oprofile is very effective in tracking
>> down the direct cause of performance differences.  Have you tried that?  In
>> much of what I do, the direct cause of a performance difference is just a
>> hint at the indirect true cause.  But in an example as simple as you've
>> provided, the direct cause is the cause.
>>
>> Are you building for 32-bit or 64-bit?
>>
>> In 32-bit, gcc is really bad at dealing with the architecture's shortage of
>> registers.  A tiny change anywhere can change gcc's register choices leading
>> into the critical loop and either cause or avoid a register spill.  That
>> alone could cause a 10% difference.
>>
>>
>> Łukasz Lew wrote:
>>>
>>> I extracted only the benchmark part:
>>> http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz
>>>
>>>
>>>
>>
>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]