This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Richard Biener <rguenther at suse dot de>, gcc-patches at gcc dot gnu dot org, Jan Hubicka <jh at suse dot de>, Diego Novillo <dnovillo at google dot com>, andi at firstfloor dot org, Michael Matz <matz at suse dot de>
- Date: Fri, 14 Jun 2013 07:45:54 +0200
- Subject: Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging
- References: <alpine dot LNX dot 2 dot 00 dot 1306121447210 dot 26078 at zhemvz dot fhfr dot qr> <alpine dot LNX dot 2 dot 00 dot 1306131022370 dot 26078 at zhemvz dot fhfr dot qr> <alpine dot LNX dot 2 dot 00 dot 1306131614000 dot 26078 at zhemvz dot fhfr dot qr> <20130613213705 dot GA1358 at atrey dot karlin dot mff dot cuni dot cz> <20130613221635 dot GB1358 at atrey dot karlin dot mff dot cuni dot cz>
> > >
> > > Ok, not streaming and comparing TREE_USED gets it improved to
> >
> > I will try to gather better data tomorrow. My mozilla build died on disk space,
> > but according to stats we are now at about 7GB of GGC memory after merging.
> > I was playing with the following patch that implements testing whether types
> > are same in my (probably naive and wrong) understanding of ODR rule in C++
>
> So i can confirm that we now need 3GB of TMP space instead of 8GB with earlier
> version of patch. I will compare to mainline tomorrow, but I think it is
> about the same.
> phase opt and generate : 96.39 ( 9%) usr 40.45 (45%) sys 136.91 (12%) wall 271042 kB ( 7%) ggc
> phase stream in : 457.87 (43%) usr 8.38 ( 9%) sys 466.44 (40%) wall 3798844 kB (93%) ggc
> phase stream out : 509.39 (48%) usr 40.82 (46%) sys 550.88 (48%) wall 7149 kB ( 0%) ggc
> ipa cp : 13.62 ( 1%) usr 5.00 ( 6%) sys 18.61 ( 2%) wall 425204 kB (10%) ggc
> ipa inlining heuristics : 60.52 ( 6%) usr 36.15 (40%) sys 96.71 ( 8%) wall 1353370 kB (33%) ggc
> ipa lto decl in : 346.94 (33%) usr 5.49 ( 6%) sys 352.60 (31%) wall 7042 kB ( 0%) ggc
> ipa lto decl out : 481.19 (45%) usr 23.28 (26%) sys 504.68 (44%) wall 0 kB ( 0%) ggc
> TOTAL :1063.67 89.65 1154.26 4078436 kB
>
> So we are still bound by streaming. I am running -flto-report overnight.
[WPA] read 43363300 SCCs of average size 2.264113
[WPA] 98179403 tree bodies read in total
[WPA] tree SCC table: size 16777213, 6422251 elements, collision ratio: 0.811639
[WPA] tree SCC max chain length 88 (size 1)
[WPA] Compared 16544560 SCCs, 275298 collisions (0.016640)
[WPA] Merged 16458553 SCCs
[WPA] Merged 46453870 tree bodies
[WPA] Merged 9535385 types
[WPA] 6771259 types prevailed (21348860 associated trees)
[WPA] Old merging code merges an additional 1759918 types of which 379059 are in the same SCC with their prevailing variant (19696849 and 15301625 associated trees)
[WPA] GIMPLE canonical type table: size 131071, 77875 elements, 6771394 searches, 1528380 collisions (ratio: 0.225711)
[WPA] GIMPLE canonical type hash table: size 16777213, 6771339 elements, 23174504 searches, 21075518 collisions (ratio: 0.909427)
....
[LTRANS] read 228296 SCCs of average size 11.882460
[LTRANS] 2712718 tree bodies read in total
[LTRANS] GIMPLE canonical type table: size 16381, 7025 elements, 704670 searches, 24040 collisions (ratio: 0.034115)
[LTRANS] GIMPLE canonical type hash table: size 1048573, 704613 elements, 2269381 searches, 2021919 collisions (ratio: 0.890956)
We manage to get stuck in one of ltranses on LRA
LRA hard reg assignment : 476.07 (44%) usr 0.03 ( 0%) sys 476.08 (44%) wall 0 kB ( 0%) ggc
28607 12.1151 lto1 alloc_page(unsigned int)
3564 1.5094 lto1 record_reg_classes(int, int, rtx_def**, machine_mode*, char const**, rtx_def*, reg_class*)
3235 1.3700 libc-2.11.1.so _int_malloc
3056 1.2942 lto1 ggc_set_mark(void const*)
2646 1.1206 lto1 gt_ggc_mx_lang_tree_node(void*)
2539 1.0753 lto1 bitmap_set_bit(bitmap_head_def*, int)
2333 0.9880 opreport /usr/bin/opreport
2210 0.9359 lto1 for_each_rtx_1(rtx_def*, int, int (*)(rtx_def**, void*), void*)
2133 0.9033 lto1 constrain_operands(int)
2128 0.9012 lto1 lookup_page_table_entry(void const*)
1586 0.6717 lto1 preprocess_constraints()
While GGC memory is now under 7GB after type streaming and we GGC just once in WPA, the TOP usage still goes to about 12GB.
With the ODR patch there are 424 devirtualizations happening during WPA and some extra (do not have stats for)
during ltrans.
Honza