This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Massive performance regression from switching to gcc 4.5


> > On 06/30/2010 02:26 PM, Basile Starynkevitch wrote:
> >> On Wed, 2010-06-30 at 14:23 -0700, Taras Glek wrote:
> >>    
> >>> I tried 4.5 -O2 and it's actually faster than 4.3 -Os.
> >>>
> >>> I am happy that -O2 performance is actually pretty good, but -Os
> >>> regression is going to hurt on mobile.
> >>>      
> >> Did you try gcc-4.5 -flto -Os or gcc-4.5 -flto -O2?
> >>
> >> It would be interesting to hear that GCC is able to LTO a program as big
> >> as Mozilla! And figures (notably RAM, CPU time, wallclock time for
> >> build) would be interesting.
> >>    
> >
> > Both whopr and flto cause gcc to segfault while building Mozilla.
> 
> 4.5 WHOPR is completely broken.  LTO is in better shape but I am not sure if we
> can resonably expect it to build mozilla.  However I would be very happy to help
> getting WHOPR working for 4.6.
Hi,
I now got the 4.6 WHOPR build up to libxul.so that seems to be one of bigger
files.

WHOPR linking consists of serial stage (WPA) merging whole program and doing
interprocedural optimization followed by parallel build.  The serial stage
needs 3.7GB of RAM, 10 minutes, most of it is spent by writting out the files
for parallel builds that are around 5GB overall.  The size of files can be
significantly cut down by sane partitioning algorithm, since we produce over
1000 partitions where 40 would do the job.  (this is with enable-checking
compiler)

Later build still die for me, but it seems that libxul is not too large for
WHOPR. (I hope all parameters to reduce significantly before 4.6 is out)

What are the other big components I should be affraid of?

Oprofile of WPA stage is as follows:

382507    8.4240  lto_output_1_stream
379158    8.3503  htab_find_slot_with_hash
207330    4.5661  bp_pack_value
155793    3.4311  iterative_hash_hashval_t
135132    2.9760  lto_output_uleb128_stream
101110    2.2268  gimple_types_compatible_p
92828     2.0444  cgraph_node_in_set_p
83205     1.8324  lto_promote_cross_file_statics
76243     1.6791  htab_expand
75993     1.6736  htab_hash_string
75790     1.6691  eq_string_slot_node
75020     1.6522  bp_unpack_value
73403     1.6166  linemap_lookup
65353     1.4393  lto_output_sleb128_stream
64864     1.4285  inflate_fast
64508     1.4207  verify_cgraph_node
60076     1.3231  lto_output_tree
57120     1.2580  referenced_from_this_partition_p
56225     1.2383  lto_input_uleb128
53620     1.1809  lto_streamer_cache_insert_1
52973     1.1666  htab_find_slot
45728     1.0071  lto_output_tree_or_ref
43428     0.9564  lto_input_1_unsigned
41556     0.9152  tree_map_base_eq
39232     0.8640  hash_cgraph_node_set_element
35695     0.7861  ggc_set_mark

So not much of surprise - streaming is ineffecient and we need a lot of time for type merging
too.  I am compiling to get time report.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]