This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Performance problem

From: "Łukasz Lew" <lukasz dot lew at gmail dot com>
To: "John Fine" <johnsfine at verizon dot net>
Cc: "Brian D. McGrew" <brian at visionpro dot com>, gcc-help at gcc dot gnu dot org
Date: Mon, 22 Sep 2008 10:37:38 +0200
Subject: Re: Performance problem
References: <c55009e70809211420l7f649d8ah9e027c55b9221567@mail.gmail.com> <A839A3EF76C5434C961DF40A54C9293C1461B0@mvpexchange120.machinevisionproducts.com> <c55009e70809211441u65d06a72qbb569a66b1c218b7@mail.gmail.com> <48D6DB17.2080202@verizon.net> <c55009e70809211659q5f40bf4cjed683c4d688a4ad8@mail.gmail.com>

I fixed the problem (I think) with rdtsc on 64bit architectures.
http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz



2008/9/22 Łukasz Lew <lukasz.lew@gmail.com>:
> Indeed I never tested my asm code on 64 bit I knew it was buggy but I
> forgot about it.
> I will try to fix it now.
>
> You may be very right about the register allocation.
> I tuned my code on 4.2 and small "irrelevant" changes changed the
> perfomance badly
> and asm output revealed among other things different registers.
>
> Is there any way to controll register allocation just as
> allways_inline controls inlining?
>
> I use Oprofile a lot, and tried to pinpoint the difference but asm
> output is too different
> while c++ annotation  is too weak because of heavy inlining.
> Lukasz
>
> On Mon, Sep 22, 2008 at 01:39, John Fine <johnsfine@verizon.net> wrote:
>> I was curious, so I tried running your benchmark.  It was too fast for
>> meaningful results, so I increased the counts int the calls to
>> simple_playout_benchmark::run and I noticed some negative and generally
>> unstable values for "clock cycles per playout".
>>
>> So your code:
>>
>>  uint64 get_cc_time () volatile {
>>   uint64 ret;
>>   __asm__ __volatile__("rdtsc" : "=A" (ret) : :);
>>   return ret;
>>  }
>>
>> gives me values that aren't even monotonic.
>>
>> I'm on a 64-bit dual core AMD system.  My best guess is that the program
>> switches cores part way through the loop. But I really don't know enough
>> about either rdtsc or __asm__ __volatile__ to know whether there might be
>> other reasons.
>>
>> Are you running on a single core system?  Or otherwise controlling for such
>> effects?
>>
>> In other projects, I've found that Oprofile is very effective in tracking
>> down the direct cause of performance differences.  Have you tried that?  In
>> much of what I do, the direct cause of a performance difference is just a
>> hint at the indirect true cause.  But in an example as simple as you've
>> provided, the direct cause is the cause.
>>
>> Are you building for 32-bit or 64-bit?
>>
>> In 32-bit, gcc is really bad at dealing with the architecture's shortage of
>> registers.  A tiny change anywhere can change gcc's register choices leading
>> into the critical loop and either cause or avoid a register spill.  That
>> alone could cause a 10% difference.
>>
>>
>> Łukasz Lew wrote:
>>>
>>> I extracted only the benchmark part:
>>> http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz
>>>
>>>
>>>
>>
>

Follow-Ups:
- Re: Performance problem
  - From: John Fine

References:
- Performance problem
  - From: Łukasz Lew
- RE: Performance problem
  - From: Brian D. McGrew
- Re: Performance problem
  - From: Łukasz Lew
- Re: Performance problem
  - From: John Fine
- Re: Performance problem
  - From: Łukasz Lew

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]