This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance problem


2008/9/24 John Fine <johnsfine@verizon.net>:
> Łukasz Lew wrote:
>>
>> I fixed the problem (I think) with rdtsc on 64bit architectures.
>> http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz
>>
>
> Seems to work.  Why was it previously correct for 32 bit?  Did the 32 bit
> compiler already combine the correct two registers?

I have no idea?
But it seems to not compile on 32bit.

>>>
>>> You may be very right about the register allocation.
>>> I tuned my code on 4.2 and small "irrelevant" changes changed the
>>> perfomance badly
>>> and asm output revealed among other things different registers.
>>>
>
> That doesn't really prove much.  Without some very good output from
> Opannotate, I don't know how to tell the real reason for the performance
> difference.
indeed, but opannotate on assembler doesn't give here much
the 10% difference is spread irregulary. some parts are slower, some are faster.
but asm of both versions correspond to each other very well except
differen registers and offsets.


>
>
>>> I use Oprofile a lot, and tried to pinpoint the difference but asm
>>> output is too different
>>> while c++ annotation  is too weak because of heavy inlining.
>>>
>
> I'm trying to understand and/or fix the use of Opannotate for some much
> harder problems, so I was curious enough to try it on your program.  I
> compiled your program x86_64 with gcc 4.4.  Even if I got good results, that
> wouldn't tell you anything about 32 bit gcc 4.3.

Can you send me the log from my benchmark?
And your processor model?

If you can do the same for g++4.3, that would be very useful for me.


>
> But I got surprisingly bad results.  I haven't previously seen such bad
> results from opannotate without using heavily templated code.  But I also
> haven't used a gcc 4.4 compiled program with opannotate before.
>
> In --source mode nearly all the total time was missing (not associated with
> any source line).

I have the same problem with g++-4.3.
My guess that this is due to heavy inlining.
btw. you would be surprised how much slower it gets if you turn off
allways inline gcc attribute.

> In mixed source and assembly view, I think all the time

Is it possible to get mixed view?

> was shown, but I don't think the assembly code corresponded very accurately
> with the source code and the time was in some very surprising lumps.  I
> usually can interpret such lumps (usually the instruction after an L2 cache
> miss or the instruction after a mispredicted branch).  But that didn't seem
> to fit the execution time lumps in your code.

L1 misses hit my code performance as well.

>
> The few points in your source code that had most of the total execution time
> were inlined multiple times with different register usage each time.  No one
> inline copy of any such routine had as much as 4% of the total execution
> time.  That tends to wreck the theory that a minor change somewhere has
> caused a big difference by changing register allocation.

Can you be more specific?
How do you know which part was inlined where?

>  There wouldn't be
> that sort of correlation in the way it changes register allocation across a
> bunch of different inlinings of the same function that already differ from
> each other in register allocation.

but do you observe the 10% difference in performance that I have on my machine?

This is getting promising, thanks for your help.
Lukasz

PS
Is there any alternative for OProfile?
If not, then why it is so undeveloped?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]