This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: More on compile performance of Linux kernels in mainline gcc
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Andi Kleen <ak at suse dot de>
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 3 Nov 2004 09:35:01 +0100
- Subject: Re: More on compile performance of Linux kernels in mainline gcc
- References: <20041103045252.GA15944@wotan.suse.de>
>
> This is an addendum for the numbers for linux kernel compiling
> on x86-64 I posted some days ago. gcc tested is the same (041029)
> on the same machine with the same kernel tree/configuration.
>
> I tracked down why the 4.0 compiled kernels didn't boot. One issue
> was a missing -fno-strict-aliasing for one file (now fixed),
> the other is a miscompilation of a loop in function in the linux
> radix tree library (PR18241) The miscompilation can be worked around
> by compiling the affected file with -O0.
>
> There are a lot of new warnings. Especially
> pointer targets in passing argument 2 of `foo' differ in signedness
> is extremly common.
>
> I was asked to retry with an make profiledbootstrap compiled
> mainline gcc.
>
> This improves the 4.0 numbers somewhat.
>
> gcc 3.3-hammer (profiledbootstrap)
> 210.32user 31.62system 3:57.66elapsed
>
> 4.0 snapshot with normal bootstrap:
> 262.71user 30.50system 4:48.46elapsed
>
> 4.0 snapshots with profiledbootstrap:
> 248.01user 30.25system 4:33.66elapsed
>
> Still considerably slower than 3.3-hammer though.
>
> Also Jan asked for oprofile output. Here are all symbols over 0.3%
> for a full kernel compile done with the profiledbootstrap compiler.
>
> Looks like the likely/unlikely split is not very effective,
> there are a lot of hot unlikely hits.
>
> Some hash table lookup(s?) seem to be very hot, perhaps it needs
> a better hash function or a larger table?
>
> 1.7% memset is somewhat worrying, that's a lot of clearing.
> 1.5% garbage collector accounting looks like a bug if that function
> isn't misnamed.
Actually both these are common cases in the profiles I've seen -
basically all our cache misses comes to these two places so even tought
they are not that expensive they are. I use -minline-all-stringops when
profiling the GCC to see where the memset really comes to.
>
> Standard GLOBAL_POWER_EVENTS:
> 95020 4.0626 cc1 yyparse.unlikely_section
> 438612 2.5638 cc1 ht_lookup_with_hash
> 298462 1.7446 libc.so.6 memset
> 288277 1.6851 cc1 _cpp_lex_direct
> 265789 1.5536 cc1 ggc_alloc_stat.unlikely_section
I have to look into it - ggc_alloc_stat is very definitly not unlikely
function... What enable-languages setting did you use?
This might explain why the speedup you are seeing is only roughly 5%
insetad of 10% I used to see (tought I didn't tested Linux kernel tree)
Honza