attached testcase eats a 1GB of ram, swap and it's killed by OOM. it works fine on 4.1.2/{all} and 4.2.0/{i386,ppc}. so, it looks like a 4.2/x86-64 target regression.
Created attachment 12729 [details] testcase
this hog blocks xorg-xserver development/testing.
memhog still present with 4.2 (r120195). backtrace attached. (gdb) bt #0 0x00002aeaa4f19e82 in ?? () from /lib64/libc.so.6 #1 0x00002aeaa4f1b756 in malloc () from /lib64/libc.so.6 #2 0x0000000000828728 in xmalloc (size=4064) at ../../libiberty/xmalloc.c:147 #3 0x00002aeaa4f1deb2 in _obstack_newchunk () from /lib64/libc.so.6 #4 0x00000000004c46cc in bitmap_elt_insert_after (head=0x1298370, elt=0x178321a0, indx=307) at ../../gcc/bitmap.c:122 #5 0x00000000004c4786 in bitmap_ior_into (a=0x1298370, b=<value optimized out>) at ../../gcc/bitmap.c:1170 #6 0x0000000000733173 in set_union_with_increment (to=0x1298370, from=0xed9df0, inc=7745) at ../../gcc/tree-ssa-structalias.c:715 #7 0x00000000007360be in solve_graph (graph=0xd5e380) at ../../gcc/tree-ssa-structalias.c:2084 #8 0x0000000000737400 in compute_points_to_sets (ai=0xd5e7e0) at ../../gcc/tree-ssa-structalias.c:4845 #9 0x0000000000489a17 in compute_may_aliases () at ../../gcc/tree-ssa-alias.c:665 #10 0x000000000071b9c8 in execute_one_pass (pass=0xb98340) at ../../gcc/passes.c:870 #11 0x000000000071bb1c in execute_pass_list (pass=0xb98340) at ../../gcc/passes.c:917 #12 0x000000000071bb2e in execute_pass_list (pass=0xb97e00) at ../../gcc/passes.c:918 #13 0x0000000000465c0e in tree_rest_of_compilation (fndecl=0x2aeaa9c8ac40) at ../../gcc/tree-optimize.c:463 #14 0x000000000040b1fc in c_expand_body (fndecl=0x2aeaa9c8ac40) at ../../gcc/c-decl.c:6814 #15 0x0000000000761064 in cgraph_expand_function (node=0x2aeaabc8d300) at ../../gcc/cgraphunit.c:1241 #16 0x000000000076197e in cgraph_optimize () at ../../gcc/cgraphunit.c:1306 #17 0x000000000040eb76 in c_write_global_declarations () at ../../gcc/c-decl.c:7929 #18 0x00000000006fdfcf in toplev_main (argc=<value optimized out>, argv=<value optimized out>) at ../../gcc/toplev.c:1046 #19 0x00002aeaa4ec9af4 in __libc_start_main () from /lib64/libc.so.6 #20 0x0000000000402769 in _start ()
hmm, if i remove this huge amount of pci*info* data structures hog doesn't present.
-O0 passes -O0 -fdefer-pop -fdelayed-branch -fguess-branch-probability -fcprop-registers -fif-conversion -fif-conversion2 -ftree-ccp -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-ter -ftree-lrs -ftree-sra -ftree-copyrename -ftree-fre -ftree-ch -funit-at-a-time -fmerge-constants -fomit-frame-pointer passes -O1 fails -O2 fails
Created attachment 13043 [details] testcase
x86_64-pld-linux-g++ -c -fPIC -O2 -fno-strict-aliasing -fwrapv -march=x86-64 -fno-strict-aliasing -gdwarf-2 -g2 -Wall -W -D_REENTRANT --save-temps -ftime-report -fmem-report -DQT_NO_DEBUG -DQT_CORE_LIB -I. -I/usr/include/python2.5 -I/usr/lib64/qt4/mkspecs/default -I/usr/include/qt4/QtCore -I/usr/include/qt4 -I/usr/ginclude -o sipQtCorepart0.o sipQtCorepart0.cpp Memory still allocated at the end of the compilation process Size Allocated Used Overhead 8 8192 6920 240 16 12k 8288 264 64 112k 111k 1792 256 20k 18k 280 512 16k 14k 224 1024 64k 61k 896 2048 56k 54k 784 4096 92k 92k 1288 8192 64k 64k 448 16384 64k 64k 224 112 4096 896 56 208 16k 13k 224 192 16k 12k 224 160 48k 46k 672 176 144k 138k 2016 96 2008k 1975k 27k 416 48k 43k 672 128 8192 6912 112 48 252k 248k 4032 224 404k 396k 5656 32 192k 188k 3456 80 28k 27k 392 Total 3676k 3594k 50k String pool entries 21015 identifiers 21015 (100.00%) slots 32768 bytes 308k (24k overhead) table size 256k coll/search 0.5458 ins/search 0.0692 avg. entry 15.05 bytes (+/- 7.88) longest entry 57 ??? tree nodes created (No per-node statistics) Type hash: size 1021, 382 elements, 0.255906 collisions DECL_DEBUG_EXPR hash: size 1021, 0 elements, 0.000000 collisions DECL_VALUE_EXPR hash: size 1021, 0 elements, 0.000000 collisions no search statistics Execution times (seconds) name lookup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 89 kB ( 3%) ggc TOTAL : 0.16 0.03 0.23 3530 kB Memory still allocated at the end of the compilation process Size Allocated Used Overhead 8 4096 2408 120 16 4424k 3253k 95k 64 13M 9377k 213k 256 648k 647k 9072 512 368k 262k 5152 1024 1876k 1876k 25k 2048 952k 840k 13k 4096 940k 936k 12k 8192 7480k 7472k 51k 16384 2160k 2144k 7560 32768 224k 192k 392 65536 960k 960k 840 131072 384k 256k 168 262144 768k 768k 168 112 3944k 3624k 53k 208 2704k 2267k 36k 192 2904k 1973k 39k 160 9732k 7882k 133k 176 16M 9357k 234k 96 16M 16M 237k 416 10M 10111k 151k 128 2576k 2020k 35k 48 18M 9012k 291k 224 3824k 3723k 52k 32 37M 37M 674k 80 11M 11M 166k Total 171M 143M 2540k String pool entries 91552 identifiers 91552 (100.00%) slots 131072 bytes 1432k (103k overhead) table size 1024k coll/search 1.3361 ins/search 0.1857 avg. entry 16.02 bytes (+/- 10.91) longest entry 430 ??? tree nodes created (No per-node statistics) Type hash: size 16381, 8503 elements, 1.014345 collisions DECL_DEBUG_EXPR hash: size 1021, 47 elements, 0.107877 collisions DECL_VALUE_EXPR hash: size 1021, 0 elements, 0.000000 collisions RESTRICT_BASE hash: size 509, 12 elements, 0.020408 collisions no search statistics Execution times (seconds) garbage collection : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall 0 kB ( 0%) ggc callgraph construction: 0.12 ( 0%) usr 0.01 ( 1%) sys 0.15 ( 0%) wall 4866 kB ( 1%) ggc callgraph optimization: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 388 kB ( 0%) ggc ipa reference : 0.05 ( 0%) usr 0.01 ( 1%) sys 0.06 ( 0%) wall 100 kB ( 0%) ggc ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa type escape : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc cfg cleanup : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 372 kB ( 0%) ggc trivially dead code : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc life analysis : 0.40 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall 3401 kB ( 0%) ggc life info update : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 1 kB ( 0%) ggc alias analysis : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall 5447 kB ( 1%) ggc register scan : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 45 kB ( 0%) ggc rebuild jump labels : 0.04 ( 0%) usr 0.01 ( 1%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.12 ( 0%) usr 0.06 ( 3%) sys 0.15 ( 0%) wall 1644 kB ( 0%) ggc parser : 1.00 ( 0%) usr 0.19 (11%) sys 1.20 ( 0%) wall 102901 kB (15%) ggc name lookup : 0.10 ( 0%) usr 0.08 ( 4%) sys 0.30 ( 0%) wall 9417 kB ( 1%) ggc integration : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 12245 kB ( 2%) ggc tree gimplify : 0.18 ( 0%) usr 0.01 ( 1%) sys 0.20 ( 0%) wall 13731 kB ( 2%) ggc tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1814 kB ( 0%) ggc tree CFG construction : 0.02 ( 0%) usr 0.01 ( 1%) sys 0.03 ( 0%) wall 12358 kB ( 2%) ggc tree CFG cleanup : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall 293 kB ( 0%) ggc tree VRP : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall 4914 kB ( 1%) ggc tree copy propagation : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 1020 kB ( 0%) ggc tree store copy prop : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 505 kB ( 0%) ggc tree find ref. vars : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 4971 kB ( 1%) ggc tree PTA : 204.64 (78%) usr 0.80 (44%) sys 224.75 (78%) wall 111448 kB (16%) ggc tree alias analysis : 23.01 ( 9%) usr 0.30 (17%) sys 24.31 ( 8%) wall 234847 kB (33%) ggc tree PHI insertion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 584 kB ( 0%) ggc tree SSA rewrite : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 8201 kB ( 1%) ggc tree SSA other : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 0.99 ( 0%) usr 0.01 ( 1%) sys 1.40 ( 0%) wall 758 kB ( 0%) ggc tree operand scan : 20.46 ( 8%) usr 0.10 ( 6%) sys 22.14 ( 8%) wall 21781 kB ( 3%) ggc dominator optimization: 0.26 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall 5721 kB ( 1%) ggc tree SRA : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 294 kB ( 0%) ggc tree STORE-CCP : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 503 kB ( 0%) ggc tree CCP : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 568 kB ( 0%) ggc tree split crit edges : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1646 kB ( 0%) ggc tree reassociation : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 34 kB ( 0%) ggc tree PRE : 0.25 ( 0%) usr 0.01 ( 1%) sys 0.27 ( 0%) wall 5578 kB ( 1%) ggc tree FRE : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 4997 kB ( 1%) ggc tree code sinking : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 32 kB ( 0%) ggc tree linearize phis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 27 kB ( 0%) ggc tree forward propagate: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 573 kB ( 0%) ggc tree conservative DCE : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc tree aggressive DCE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 170 kB ( 0%) ggc PHI merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 138 kB ( 0%) ggc loop invariant motion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc scev constant prop : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 38 kB ( 0%) ggc tree SSA uncprop : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree SSA to normal : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 3144 kB ( 0%) ggc tree rename SSA copies: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc expand : 2.02 ( 1%) usr 0.04 ( 2%) sys 2.08 ( 1%) wall 39944 kB ( 6%) ggc varconst : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 710 kB ( 0%) ggc jump : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 726 kB ( 0%) ggc CSE : 0.57 ( 0%) usr 0.01 ( 1%) sys 0.72 ( 0%) wall 3532 kB ( 0%) ggc loop analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 715 kB ( 0%) ggc global CSE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc CPROP 1 : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 1052 kB ( 0%) ggc PRE : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 502 kB ( 0%) ggc CPROP 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 715 kB ( 0%) ggc bypass jumps : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 662 kB ( 0%) ggc CSE 2 : 0.40 ( 0%) usr 0.00 ( 0%) sys 0.47 ( 0%) wall 2153 kB ( 0%) ggc branch prediction : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 255 kB ( 0%) ggc flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc combiner : 0.41 ( 0%) usr 0.01 ( 1%) sys 0.49 ( 0%) wall 3729 kB ( 1%) ggc if-conversion : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 196 kB ( 0%) ggc regmove : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 389 kB ( 0%) ggc local alloc : 0.56 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%) wall 4120 kB ( 1%) ggc global alloc : 0.57 ( 0%) usr 0.00 ( 0%) sys 0.56 ( 0%) wall 3302 kB ( 0%) ggc reload CSE regs : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 3609 kB ( 1%) ggc flow 2 : 0.05 ( 0%) usr 0.01 ( 1%) sys 0.05 ( 0%) wall 2992 kB ( 0%) ggc if-conversion 2 : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc peephole 2 : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 628 kB ( 0%) ggc rename registers : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 44 kB ( 0%) ggc scheduling 2 : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.74 ( 0%) wall 6840 kB ( 1%) ggc machine dep reorg : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 85 kB ( 0%) ggc reorder blocks : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 2011 kB ( 0%) ggc final : 0.33 ( 0%) usr 0.04 ( 2%) sys 0.40 ( 0%) wall 5874 kB ( 1%) ggc symout : 0.45 ( 0%) usr 0.05 ( 3%) sys 0.58 ( 0%) wall 35050 kB ( 5%) ggc variable tracking : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall 3314 kB ( 0%) ggc TOTAL : 262.28 1.80 288.10 708068 kB 0x000000000052ea0d in bitmap_ior_into (a=0x16e5cb0, b=<value optimized out>) at gcc/bitmap.c:1198 1198 gcc/bitmap.c: No such file or directory. in gcc/bitmap.c (gdb) bt #0 0x000000000052ea0d in bitmap_ior_into (a=0x16e5cb0, b=<value optimized out>) at gcc/bitmap.c:1198 #1 0x000000000079d123 in set_union_with_increment (to=0x16e5cb0, from=0x1574ac0, inc=18446744073709551615) at gcc/tree-ssa-structalias.c:715 #2 0x00000000007a007e in solve_graph (graph=0x13f75c0) at gcc/tree-ssa-structalias.c:2084 #3 0x00000000007a13b0 in compute_points_to_sets (ai=0x146bdd0) at gcc/tree-ssa-structalias.c:4845 #4 0x00000000004f39d7 in compute_may_aliases () at gcc/tree-ssa-alias.c:665 #5 0x00000000007859f8 in execute_one_pass (pass=0xcddf70) at gcc/passes.c:870 #6 0x0000000000785b5c in execute_pass_list (pass=0xcddf70) at gcc/passes.c:920 #7 0x0000000000785b6e in execute_pass_list (pass=0xc1eae0) at gcc/passes.c:921 #8 0x00000000004cfb9e in tree_rest_of_compilation (fndecl=0x2b8ee7ea8c40) at gcc/tree-optimize.c:463 #9 0x00000000004730c9 in expand_body (fn=0x2b8ee7ea8c40) at gcc/cp/semantics.c:3075 #10 0x00000000007c75f4 in cgraph_expand_function (node=0x2b8ee7e98e40) at gcc/cgraphunit.c:1241 #11 0x00000000007c7f0e in cgraph_optimize () at gcc/cgraphunit.c:1306 #12 0x000000000044126f in cp_finish_file () at gcc/cp/decl2.c:3347 #13 0x00000000004b58ca in c_common_parse_file (set_yydebug=<value optimized out>) at gcc/c-opts.c:1176 #14 0x0000000000767a03 in toplev_main (argc=<value optimized out>, argv=<value optimized out>) at gcc/toplev.c:1033 #15 0x00002b8ee30c0954 in __libc_start_main () from /lib64/libc.so.6 #16 0x0000000000402769 in _start ()
Does this work on mainline with no real issue? If so, i'll try to backport the solver changes.
(In reply to comment #8) > Does this work on mainline with no real issue? > > If so, i'll try to backport the solver changes. > gcc43 works faster, but still needs lot of memory. (...) tree PTA : 1.33 ( 2%) usr 0.01 ( 1%) sys 1.41 ( 2%) wall 10753 kB ( 2%) ggc tree alias analysis : 14.10 (21%) usr 0.08 ( 8%) sys 14.75 (21%) wall 55882 kB (10%) ggc (...) TOTAL : 68.67 1.03 71.55 537971 kB btw. sipQtGuipart0.ii from PyQt-x11-gpl-4.1.1 needs over 1GB of ram at -O1 :(
Subject: Re: possible quadratic behaviour. On 13 Feb 2007 10:37:55 -0000, pluto at agmk dot net <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #9 from pluto at agmk dot net 2007-02-13 10:37 ------- > (In reply to comment #8) > > Does this work on mainline with no real issue? > > > > If so, i'll try to backport the solver changes. > > > > gcc43 works faster, but still needs lot of memory. Yeah, but you need to show me that the reason it needs a lot of memory is due to pta: > > (...) > tree PTA : 1.33 ( 2%) usr 0.01 ( 1%) sys 1.41 ( 2%) wall > 10753 kB ( 2%) ggc > tree alias analysis : 14.10 (21%) usr 0.08 ( 8%) sys 14.75 (21%) wall > 55882 kB (10%) ggc > (...) > TOTAL : 68.67 1.03 71.55 > 537971 kB > Except that 55 meg and 10 meg of 512meg of gc memory is not a lot, relatively speaking (It's not like it's 90% of the ggc memory or something). Also, alias analysis and PTA use heap memory that will not show up here.
(In reply to comment #10) > Also, alias analysis and PTA use heap memory that will not show up here. so, how can i diagnose the gcc heap usage?
*** Bug 31172 has been marked as a duplicate of this bug. ***
It's really all PTA memory. Mainline: tree PTA : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 40 kB ( 0%) ggc TOTAL : 2.18 1.05 3.44 48857 kB max. VM usage: 63MB 4.2: tree PTA : 18.41 (88%) usr 1.08 (53%) sys 20.32 (85%) wall 3903 kB ( 8%) ggc TOTAL : 20.92 2.02 23.94 48672 kB max. VM usage: 1.1GB 4.1.2 uses 56MB.
(In reply to comment #13) > It's really all PTA memory. > > Mainline: > > tree PTA : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall > 40 kB ( 0%) ggc > TOTAL : 2.18 1.05 3.44 > 48857 kB > > max. VM usage: 63MB > > 4.2: > > tree PTA : 18.41 (88%) usr 1.08 (53%) sys 20.32 (85%) wall > 3903 kB ( 8%) ggc > TOTAL : 20.92 2.02 23.94 > 48672 kB > > max. VM usage: 1.1GB > > 4.1.2 uses 56MB. I'll backport the changes (this is more or less copying tree-ssa-structalias.c from 4.3 to 4.2 and modifying the few things that changed in 4.3 :P) >
(In reply to comment #14) > > 4.1.2 uses 56MB. > I'll backport the changes (this is more or less copying tree-ssa-structalias.c > from 4.3 to 4.2 and modifying the few things that changed in 4.3 :P) Daniel, are you working on it?
Subject: Re: [4.2 Regression] possible quadratic behaviour. On 25 Apr 2007 20:56:24 -0000, pluto at agmk dot net <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #15 from pluto at agmk dot net 2007-04-25 21:56 ------- > (In reply to comment #14) > > > > 4.1.2 uses 56MB. > > I'll backport the changes (this is more or less copying tree-ssa-structalias.c > > from 4.3 to 4.2 and modifying the few things that changed in 4.3 :P) > > Daniel, are you working on it? So, i gave it the old college try, and it turns out to be much harder than I expected because of mem-ssa and other changes that went into 4.3 > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30052 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >
(In reply to comment #16) > So, i gave it the old college try, and it turns out to be much harder > than I expected because of mem-ssa and other changes that went into > 4.3 yup, looks like a nice bullet for 4.2.0 release. with such features 4.2 will be widely /dev/nulled.
*** Bug 31984 has been marked as a duplicate of this bug. ***
Created attachment 13576 [details] Possible patch The attached is a huge backport of the 4.3 solver changes. I have only minimally tested it. Let me know if it helps on memory usage, and I will bootstrap/regtest it.
Subject: Re: [4.2 Regression] possible quadratic behaviour. Testsuite ran on i686-pc-linux-gnu without failures (i've enabled c and c++ only). xorg-server now compiles as well. The patch deserves more testing but IMHO it's in the right direction. Massimiliano Vegni IT SYSTEMS srl Roma, Italy dberlin at gcc dot gnu dot org wrote: > ------- Comment #19 from dberlin at gcc dot gnu dot org 2007-05-18 14:46 ------- > Created an attachment (id=13576) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13576&action=view) > Possible patch > > The attached is a huge backport of the 4.3 solver changes. > I have only minimally tested it. > Let me know if it helps on memory usage, and I will bootstrap/regtest it. > >
with this patc gcc works much better. xf86ScanPci.i : 84MB / ~5sec. sipQtCorepart0.ii.bz2 : 340MB / ~440sec. gcc/g++ testsuite on x86_64 shows no new regressions.
Subject: Re: [4.2 Regression] possible quadratic behaviour. On 19 May 2007 14:30:43 -0000, pluto at agmk dot net <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #21 from pluto at agmk dot net 2007-05-19 15:30 ------- > with this patc gcc works much better. > > xf86ScanPci.i : 84MB / ~5sec. > sipQtCorepart0.ii.bz2 : 340MB / ~440sec There are optimizations that could be made to the 440 seconds if they are in PTA solving, but they wouldn't really help mainline much, so i'm not sure if it is worth it.
bad news, this patch ices fortran build: (...) ../../../libgfortran/intrinsics/selected_int_kind.f90:22: internal compiler error: in process_constraint, at tree-ssa-structalias.c:2260
Subject: Re: [4.2 Regression] possible quadratic behaviour. On 19 May 2007 17:16:35 -0000, pluto at agmk dot net <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #23 from pluto at agmk dot net 2007-05-19 18:16 ------- > bad news, this patch ices fortran build: > > (...) > ../../../libgfortran/intrinsics/selected_int_kind.f90:22: internal compiler > error: in process_constraint, at tree-ssa-structalias.c:2260 Meh, send me the file. This is just a small bug somewhere in the backport.
Subject: Re: [4.2 Regression] possible quadratic behaviour. On Saturday 19 of May 2007 19:43:33 dberlin at dberlin dot org wrote: > ------- Comment #24 from dberlin at gcc dot gnu dot org 2007-05-19 18:43 > ------- Subject: Re: [4.2 Regression] possible quadratic behaviour. > > On 19 May 2007 17:16:35 -0000, pluto at agmk dot net > > <gcc-bugzilla@gcc.gnu.org> wrote: > > ------- Comment #23 from pluto at agmk dot net 2007-05-19 18:16 ------- > > bad news, this patch ices fortran build: > > > > (...) > > ../../../libgfortran/intrinsics/selected_int_kind.f90:22: internal > > compiler error: in process_constraint, at tree-ssa-structalias.c:2260 > > Meh, send me the file. > This is just a small bug somewhere in the backport.
Created attachment 13585 [details] selected_int_kind.f90
Created attachment 13586 [details] selected_int_kind.inc
Subject: Re: [4.2 Regression] possible quadratic behaviour. On 20 May 2007 04:57:45 -0000, pluto at agmk dot net <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #25 from pluto at agmk dot net 2007-05-20 05:57 ------- > Subject: Re: [4.2 Regression] possible quadratic behaviour. > > -- > Change line 4275 of the patched tree-ssa-structalias.c to be rhs.var = vi->id instead of rhs.var = id Remove the id variable declaration. This would have only affected fortran .... > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30052 > >
(In reply to comment #28) > Change line 4275 of the patched tree-ssa-structalias.c to be rhs.var = > vi->id instead of rhs.var = id > > Remove the id variable declaration. > > This would have only affected fortran .... thx, this change fixes bootstrap. will you commit this for 4.2.1?
Subject: Re: [4.2 Regression] possible quadratic behaviour. On 21 May 2007 16:01:29 -0000, pluto at agmk dot net <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #29 from pluto at agmk dot net 2007-05-21 17:01 ------- > (In reply to comment #28) > > Change line 4275 of the patched tree-ssa-structalias.c to be rhs.var = > > vi->id instead of rhs.var = id > > > > Remove the id variable declaration. > > > > This would have only affected fortran .... > > thx, this change fixes bootstrap. > will you commit this for 4.2.1? Sure. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30052 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >
r125227 | dberlin | 2007-05-31 11:37:38 -0400 (Thu, 31 May 2007) | 11 lines 2007-05-27 Daniel Berlin <dberlin@dberlin.org> Fix PR/30052 Backport PTA solver from mainline * pointer-set.c: Copy from mainline * pointer-set.h: Ditto. * tree-ssa-structalias.c: Copy solver portions from mainline. * Makefile.in (tree-ssa-structalias.o): Update dependencies
*** Bug 32266 has been marked as a duplicate of this bug. ***
i'm reopening this bug becasue the fix is not complete. it does fix the xf86ScanPci.i testcase (time/mem hog) and this is great $ time gcc xf86ScanPci.i -O1 -c ( 2.2GHz amd64, 1GB ram ). gcc xf86ScanPci.i -O1 -c 4.10s user 0.20s system 92% cpu 4.665 total the patch doesn't fix the sipQtCorepart0.ii time hog, only mem hog is fixed. g++ needs about 300MB of ram and +inf? (canceled after 6 days) of time.
Subject: Re: [4.2 Regression] possible quadratic behaviour. On Mon, 11 Jun 2007, pluto at agmk dot net wrote: > > > ------- Comment #33 from pluto at agmk dot net 2007-06-11 13:04 ------- > i'm reopening this bug becasue the fix is not complete. > > it does fix the xf86ScanPci.i testcase (time/mem hog) and this is great > > $ time gcc xf86ScanPci.i -O1 -c ( 2.2GHz amd64, 1GB ram ). > gcc xf86ScanPci.i -O1 -c 4.10s user 0.20s system 92% cpu 4.665 total > > the patch doesn't fix the sipQtCorepart0.ii time hog, only mem hog is fixed. > g++ needs about 300MB of ram and +inf? (canceled after 6 days) of time. Can you check where the time is spent on? Richard.
(In reply to comment #34) > Can you check where the time is spent on? naturally, i'm building gcc with debuginfo now...
Created attachment 13677 [details] testcase for time-hog.
Looks like it's still PTA: tree PTA : 255.00 ( 0%) usr 17.25 ( 0%) sys 278.07 ( 0%) wall 28100 kB ( 0%) ggc (just a snapshot after a few minutes compile)
(In reply to comment #34) > > the patch doesn't fix the sipQtCorepart0.ii time hog, only mem hog is fixed. > > g++ needs about 300MB of ram and +inf? (canceled after 6 days) of time. ops little eye damage, the g++ sits on sipQtGuipart0.ii. > Can you check where the time is spent on? several backtraces show the gcc sits around solve_graph()... #0 0x000000000052ef83 in bitmap_elt_insert_after () #1 0x000000000052f086 in bitmap_ior_into () #2 0x00000000007a33fb in solve_graph () #3 0x00000000007a5709 in compute_points_to_sets () #4 0x00000000004f3e67 in compute_may_aliases () #5 0x0000000000787078 in execute_one_pass () #6 0x00000000007871dc in execute_pass_list () #7 0x00000000007871ee in execute_pass_list () #8 0x00000000004cffee in tree_rest_of_compilation () #9 0x0000000000473339 in expand_body () #10 0x00000000007cb994 in cgraph_expand_function () #11 0x00000000007cc2be in cgraph_optimize () #12 0x000000000044116f in cp_finish_file () #13 0x00000000004b5cda in c_common_parse_file () #14 0x0000000000769083 in toplev_main () #15 0x00002b4f81ac6b54 in __libc_start_main () from /lib64/libc.so.6 #16 0x0000000000402459 in _start () #0 0x000000000052ef83 in bitmap_elt_insert_after () #1 0x000000000052f086 in bitmap_ior_into () #2 0x00000000007a1a93 in set_union_with_increment () #3 0x00000000007a3b4d in solve_graph () (...) #0 0x000000000052f601 in bitmap_and_compl () #1 0x00000000007a33d2 in solve_graph () (...) and some deep trace: #0 0x00002b4f81b1f95b in memset () from /lib64/libc.so.6 #1 0x00002b4f81b1acd0 in calloc () from /lib64/libc.so.6 #2 0x0000000000893969 in xcalloc () #3 0x0000000000719e74 in pointer_set_create () #4 0x000000000076b7b6 in walk_tree_without_duplicates () #5 0x000000000079ee2f in create_variable_info_for () #6 0x000000000079f2a0 in get_vi_for_tree () #7 0x000000000079f2fd in get_constraint_exp_from_ssa_var () #8 0x000000000079faa5 in get_constraint_for () #9 0x000000000079fbbd in get_constraint_for () #10 0x00000000007a13f6 in find_global_initializers () #11 0x000000000076b1c5 in walk_tree () #12 0x000000000076b3fd in walk_tree () #13 0x000000000076b3fd in walk_tree () #14 0x000000000076b7ca in walk_tree_without_duplicates () #15 0x000000000079ee2f in create_variable_info_for () #16 0x000000000079f2a0 in get_vi_for_tree () #17 0x000000000079f2fd in get_constraint_exp_from_ssa_var () #18 0x000000000079faa5 in get_constraint_for () #19 0x000000000079fbbd in get_constraint_for () #20 0x00000000007a13f6 in find_global_initializers () #21 0x000000000076b1c5 in walk_tree () #22 0x000000000076b3fd in walk_tree () #23 0x000000000076b7ca in walk_tree_without_duplicates () #24 0x000000000079ee2f in create_variable_info_for () #25 0x000000000079f2a0 in get_vi_for_tree () #26 0x000000000079f2fd in get_constraint_exp_from_ssa_var () #27 0x000000000079faa5 in get_constraint_for () #28 0x000000000079fbbd in get_constraint_for () #29 0x00000000007a13f6 in find_global_initializers () #30 0x000000000076b1c5 in walk_tree () #31 0x000000000076b3fd in walk_tree () #32 0x000000000076b7ca in walk_tree_without_duplicates () #33 0x000000000079ee2f in create_variable_info_for () #34 0x000000000079f2a0 in get_vi_for_tree () #35 0x000000000079f2fd in get_constraint_exp_from_ssa_var () #36 0x000000000079faa5 in get_constraint_for () #37 0x000000000079fbbd in get_constraint_for () #38 0x00000000007a13f6 in find_global_initializers () #39 0x000000000076b1c5 in walk_tree () #40 0x000000000076b3fd in walk_tree () #41 0x000000000076b7ca in walk_tree_without_duplicates () #42 0x000000000079ee2f in create_variable_info_for () #43 0x000000000079f2a0 in get_vi_for_tree () #44 0x000000000079f2fd in get_constraint_exp_from_ssa_var () #45 0x000000000079faa5 in get_constraint_for () #46 0x000000000079fd46 in get_constraint_for () #47 0x00000000007a1132 in find_func_aliases () #48 0x00000000007a4bef in compute_points_to_sets () #49 0x00000000004f3e67 in compute_may_aliases () #50 0x0000000000787078 in execute_one_pass () #51 0x00000000007871dc in execute_pass_list () #52 0x00000000007871ee in execute_pass_list () #53 0x00000000004cffee in tree_rest_of_compilation () #54 0x0000000000473339 in expand_body () #55 0x00000000007cb994 in cgraph_expand_function () #56 0x00000000007cc2be in cgraph_optimize () #57 0x000000000044116f in cp_finish_file () #58 0x00000000004b5cda in c_common_parse_file () #59 0x0000000000769083 in toplev_main () #60 0x00002b4f81ac6b54 in __libc_start_main () from /lib64/libc.so.6 #61 0x0000000000402459 in _start ()
Created attachment 13678 [details] unincluded testcase unincluded testcase that also "works" with mainline. Where the slowness is also present.
And again, a 4.2/4.3 regression wrt compile-time _and_ memory-usage. Mainline needs 1.2GB ram and whatnot time, 4.1 is happy with 500MB and about 10s.
Subject: Re: [4.2/4.3 Regression] possible quadratic behaviour. On 11 Jun 2007 14:17:46 -0000, rguenth at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #40 from rguenth at gcc dot gnu dot org 2007-06-11 14:17 ------- > And again, a 4.2/4.3 regression wrt compile-time _and_ memory-usage. Mainline > needs 1.2GB ram and whatnot time, 4.1 is happy with 500MB and about 10s. Memory usage is just going to be up because we keep the last points-to sets to propagate only differences. It will take roughly twice as much memory. This is just the cost of doing business faster. I'll work on the time regression. And please change the summary of this bug to something sane like "points-to analysis too slow" The algorithm is cubic in the worst case, and you simply can't do anything about this. :)
*** Bug 32900 has been marked as a duplicate of this bug. ***
On my current branch, which i will commit soon, i have tree PTA : 14.56 ( 1%) usr 0.57 ( 1%) sys 16.98 ( 1%) wall 26372 kB ( 2%) ggc tree alias analysis : 577.90 (26%) usr 8.72 ( 8%) sys 611.13 (24%) wall 108272 kB ( 7%) ggc I have looked through this bug report again, I also don't see PTA taking up your memory. (Alias analysis i will try to fix, but it's a bit tricky) What is happening is the memory is increasing slowly. It's not like PTA is suddenly allocating 1.5 gig. So your memory usage is not coming from PTA (at least, not in a way I can solve, unless you see a leak somewhere). It is much more likely someone is leaking memory. In short I have absolutely no plans to work on the memory hog portion of this bug, and deny that points-to analysis is "memory hungry" in 4.3 because i don't see it.
Daniel, are you then going to fix the "slow" part of this bug? As for the memhog, CC'ing Honza which is expert on memory allocations and leaks :)
Subject: Re: [4.2/4.3 Regression] points-to analysis slow and memory hungry Uh, it's not slow anymore since I committed the patch last month. On 11 Sep 2007 10:59:31 -0000, giovannibajo at libero dot it <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #44 from giovannibajo at libero dot it 2007-09-11 10:59 ------- > Daniel, are you then going to fix the "slow" part of this bug? > > As for the memhog, CC'ing Honza which is expert on memory allocations and leaks > :) > > > -- > > giovannibajo at libero dot it changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |hubicka at gcc dot gnu dot > | |org > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30052 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >
(In reply to comment #45) > Uh, it's not slow anymore since I committed the patch last month. Please define "not slow" :) I stopped compilation of "sipQtGuipart0.cpp" after 2.5+ hours and called timevar_print (stdout) from gdb: tree PTA : 7383.90 ( 0%) wall 652668 kB ( 0%) ggc tree alias analysis : 550.29 ( 0%) wall 1192023 kB ( 0%) ggc tree operand scan : 178.91 ( 0%) wall 77836 kB ( 0%) ggc (all other timevars < 10 s) This is gcc-4.2 r128355. It takes about the same amount of memory (~500 MB) as 4.1 but never finishes. Unlike 4.2, 4.3 r128377 runs out of memory, and it is much slower than 4.1. Here is output from timevar_print() from gdb at the time it hit the swap (about four minutes): tree find ref. vars : 15.88 ( 0%) wall 990776 kB ( 0%) ggc tree PTA : 2.22 ( 0%) wall 23052 kB ( 0%) ggc tree alias analysis : 13.24 ( 0%) wall 8179 kB ( 0%) ggc tree call clobbering : 52.63 ( 0%) wall 2020 kB ( 0%) ggc tree flow sensitive alias: 1.28 ( 0%) wall 123311 kB ( 0%) ggc tree flow insensitive alias: 21.84 ( 0%) wall 0 kB ( 0%) ggc tree memory partitioning: 61.98 ( 0%) wall 870 kB ( 0%) ggc ... tree operand scan : 21.86( 0%) wall 128117 kB ( 0%) ggc For comparison, 4.1 compiles that file in 50 seconds and takes roughly 500 MB. (note all timings were done with --enable-checking=release compilers).
Subject: Re: [4.2/4.3 Regression] points-to analysis slow and memory hungry On 11 Sep 2007 19:51:00 -0000, belyshev at depni dot sinp dot msu dot ru <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #46 from belyshev at depni dot sinp dot msu dot ru 2007-09-11 19:50 ------- > (In reply to comment #45) > > Uh, it's not slow anymore since I committed the patch last month. > > Please define "not slow" :) I said for mainline I'm not backporting even more to 4.2. > > Unlike 4.2, 4.3 r128377 runs out of memory, and it is much slower than 4.1. > Here is output from timevar_print() from gdb at the time it hit the swap (about > four minutes): > > tree find ref. vars : 15.88 ( 0%) wall 990776 kB ( 0%) ggc > tree PTA : 2.22 ( 0%) wall 23052 kB ( 0%) ggc > tree alias analysis : 13.24 ( 0%) wall 8179 kB ( 0%) ggc > tree call clobbering : 52.63 ( 0%) wall 2020 kB ( 0%) ggc > tree flow sensitive alias: 1.28 ( 0%) wall 123311 kB ( 0%) ggc > tree flow insensitive alias: 21.84 ( 0%) wall 0 kB ( 0%) ggc > tree memory partitioning: 61.98 ( 0%) wall 870 kB ( 0%) ggc > ... > tree operand scan : 21.86( 0%) wall 128117 kB ( 0%) ggc > 2.22 seconds is fast. It only takes 23 meg of memory for PTA. The rest of your memory usage is not my bug. The rest of your time usage is not my bug. If you want to get them fixed, i suggest you file a bug not entitled "points-to analysis slow and memory hungry".
4.3 is no longer a regression as it does PTA faster than 4.1, and uses less memory.
*** Bug 33708 has been marked as a duplicate of this bug. ***
Change target milestone to 4.2.3, as 4.2.2 has been released.
4.2.3 is being released now, changing milestones of open bugs to 4.2.4.
(In reply to comment #33) > it does fix the xf86ScanPci.i testcase (time/mem hog) and this is great Unfortunately it looks like that the patch it did _not_ fix the mem hog problem. I'm compiling the xorg-xserver using gcc 4.2.3 (cross-compiling for powerpc on x86) and the compiler uses around 1G RAM when compiling xf86ScanPci.c. I'm not sure about the priority of this problem, but if more information about my environment are needed to fix it in the 4.2 branch, please ask.
Subject: Re: [4.2 Regression] points-to analysis slow and memory hungry On Wed, 12 Mar 2008, chkr at plauener dot de wrote: > ------- Comment #52 from chkr at plauener dot de 2008-03-12 11:18 ------- > (In reply to comment #33) > > it does fix the xf86ScanPci.i testcase (time/mem hog) and this is great > > Unfortunately it looks like that the patch it did _not_ fix the mem hog > problem. > > I'm compiling the xorg-xserver using gcc 4.2.3 (cross-compiling for powerpc on > x86) and the compiler uses around 1G RAM when compiling xf86ScanPci.c. > > I'm not sure about the priority of this problem, but if more information about > my environment are needed to fix it in the 4.2 branch, please ask. Whatever problem remains on the 4.2 branch, it is not going to be fixed there. Sorry. Richard.
Fixed since 4.3.0, WONTFIX on earlier branches.
*** Bug 36290 has been marked as a duplicate of this bug. ***
What is the workaround for this bug? It looks like not even -O1 fixes the compile-time hog.
"Fixed since 4.3.0, WONTFIX on earlier branches." There is no workaround.
Note that for 4.3 the testcases are still slow and memory-hungry but not because of PTA but because of memory partitioning (we have a PR for that) and because of call clobber analysis. tree find ref. vars : 15.36 ( 5%) usr 0.80 ( 8%) sys 15.90 ( 5%) wall 817801 kB (35%) ggc tree alias analysis : 16.27 ( 5%) usr 0.38 ( 4%) sys 16.11 ( 5%) wall 11037 kB ( 0%) ggc tree call clobbering : 41.35 (14%) usr 0.28 ( 3%) sys 43.00 (14%) wall 3132 kB ( 0%) ggc tree flow insensitive alias: 31.26 (10%) usr 0.34 ( 3%) sys 31.94 (10%) wall 0 kB ( 0%) ggc tree memory partitioning: 83.32 (28%) usr 0.89 ( 9%) sys 84.36 (27%) wall 974 kB ( 0%) ggc tree SSA incremental : 10.60 ( 4%) usr 0.17 ( 2%) sys 11.11 ( 4%) wall 15755 kB ( 1%) ggc tree operand scan : 28.71 (10%) usr 0.57 ( 6%) sys 29.59 ( 9%) wall 160271 kB ( 7%) ggc the flow-insensitive analysis part is also interesting (this is the sipQtGuipart0.cpp testcase). I will open a new PR to track the general problems with this testcase.
-> PR36291.
If a knowledgable GCC developer could suggest *any* workaround at -O1 for this bug in 4.2 (including disabling whatever alias analysys causes the problem), it might be proposed as a fix within distros at least.
Subject: Re: [4.2 Regression] points-to analysis slow and memory hungry On Tue, 10 Jun 2008, giovannibajo at libero dot it wrote: > ------- Comment #60 from giovannibajo at libero dot it 2008-06-10 17:26 ------- > If a knowledgable GCC developer could suggest *any* workaround at -O1 for this > bug in 4.2 (including disabling whatever alias analysys causes the problem), it > might be proposed as a fix within distros at least. You can try if --param max-fields-for-field-sensitive=0 improves the situation. Other than that, try, in tree-ssa-structalias.c:create_variable_info_for remove the make_constraint_from_escaped and make_constraint_to_escaped calls for the is_global cases. Note that you need to adjust find_what_p_points_to to include escaped variables if escaped_id is set in the solution and that call clobbering will need similar adjustments (and remember escaped_id includes all globals implicitly). The problem with the sipQt testcase is that it has 10000s of global vars it creates constraints for, even though they are unused. Richard.