This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Massive performance regression from switching to gcc 4.5
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Taras Glek <tglek at mozilla dot com>, basile at starynkevitch dot net, Andrew Pinski <pinskia at gmail dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Tue, 6 Jul 2010 18:46:07 +0200
- Subject: Re: Massive performance regression from switching to gcc 4.5
- References: <4C23A90C.1000401@mozilla.com> <AE37111E-D5B1-4BF2-8938-937D2A1940D9@gmail.com> <4C2BB5EF.8040800@mozilla.com> <1277933200.4504.49.camel@glinka> <4C2BB768.2010307@mozilla.com> <20100630220600.GA22187@atrey.karlin.mff.cuni.cz>
> > On 06/30/2010 02:26 PM, Basile Starynkevitch wrote:
> >> On Wed, 2010-06-30 at 14:23 -0700, Taras Glek wrote:
> >>
> >>> I tried 4.5 -O2 and it's actually faster than 4.3 -Os.
> >>>
> >>> I am happy that -O2 performance is actually pretty good, but -Os
> >>> regression is going to hurt on mobile.
> >>>
> >> Did you try gcc-4.5 -flto -Os or gcc-4.5 -flto -O2?
> >>
> >> It would be interesting to hear that GCC is able to LTO a program as big
> >> as Mozilla! And figures (notably RAM, CPU time, wallclock time for
> >> build) would be interesting.
> >>
> >
> > Both whopr and flto cause gcc to segfault while building Mozilla.
>
> 4.5 WHOPR is completely broken. LTO is in better shape but I am not sure if we
> can resonably expect it to build mozilla. However I would be very happy to help
> getting WHOPR working for 4.6.
Hi,
I now got the 4.6 WHOPR build up to libxul.so that seems to be one of bigger
files.
WHOPR linking consists of serial stage (WPA) merging whole program and doing
interprocedural optimization followed by parallel build. The serial stage
needs 3.7GB of RAM, 10 minutes, most of it is spent by writting out the files
for parallel builds that are around 5GB overall. The size of files can be
significantly cut down by sane partitioning algorithm, since we produce over
1000 partitions where 40 would do the job. (this is with enable-checking
compiler)
Later build still die for me, but it seems that libxul is not too large for
WHOPR. (I hope all parameters to reduce significantly before 4.6 is out)
What are the other big components I should be affraid of?
Oprofile of WPA stage is as follows:
382507 8.4240 lto_output_1_stream
379158 8.3503 htab_find_slot_with_hash
207330 4.5661 bp_pack_value
155793 3.4311 iterative_hash_hashval_t
135132 2.9760 lto_output_uleb128_stream
101110 2.2268 gimple_types_compatible_p
92828 2.0444 cgraph_node_in_set_p
83205 1.8324 lto_promote_cross_file_statics
76243 1.6791 htab_expand
75993 1.6736 htab_hash_string
75790 1.6691 eq_string_slot_node
75020 1.6522 bp_unpack_value
73403 1.6166 linemap_lookup
65353 1.4393 lto_output_sleb128_stream
64864 1.4285 inflate_fast
64508 1.4207 verify_cgraph_node
60076 1.3231 lto_output_tree
57120 1.2580 referenced_from_this_partition_p
56225 1.2383 lto_input_uleb128
53620 1.1809 lto_streamer_cache_insert_1
52973 1.1666 htab_find_slot
45728 1.0071 lto_output_tree_or_ref
43428 0.9564 lto_input_1_unsigned
41556 0.9152 tree_map_base_eq
39232 0.8640 hash_cgraph_node_set_element
35695 0.7861 ggc_set_mark
So not much of surprise - streaming is ineffecient and we need a lot of time for type merging
too. I am compiling to get time report.
Honza