This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Whole program optimization and functions-only-called-once.


Richard Guenther wrote:
On Sun, Nov 22, 2009 at 5:18 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
Toon Moene wrote:
Toon Moene wrote:

Jan Hubicka wrote:
:) It would be nice to know what caused the OOM. Is just one of passes
exploding
on presence of very large bodies?
I'll try to figure this out over the weekend (sorry, don't have more
spare time).

It's most probably a single pass, because the memory requirements kept
creeping up to 12.5 Gbytes from 10, slowly increasing all the time over
several minutes.
Here are the tracebacks from gdb attached to the lto1 process, while it
was expanding from 7 to 12 Gb:

(gdb) where
#0  0x00002b961290491e in memset () from /lib/libc.so.6
#1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
../../gcc/gcc/ira-build.c:155
#2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
../../gcc/gcc/passes.c:1522
#6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
../../gcc/gcc/passes.c:1577
#7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
../../gcc/gcc/passes.c:1578
#8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
at ../../gcc/gcc/tree-optimize.c:407
#9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
../../gcc/gcc/cgraphunit.c:1178
#10 0x00000000007835ed in cgraph_expand_all_functions () at
../../gcc/gcc/cgraphunit.c:1245
#11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
../../gcc/gcc/lto/lto.c:2054
#13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
../../gcc/gcc/toplev.c:1049
#14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

(gdb) where
#0  0x00002b961290490f in memset () from /lib/libc.so.6
#1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
../../gcc/gcc/ira-build.c:155
#2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
../../gcc/gcc/passes.c:1522
#6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
../../gcc/gcc/passes.c:1577
#7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
../../gcc/gcc/passes.c:1578
#8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
at ../../gcc/gcc/tree-optimize.c:407
#9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
../../gcc/gcc/cgraphunit.c:1178
#10 0x00000000007835ed in cgraph_expand_all_functions () at
../../gcc/gcc/cgraphunit.c:1245
#11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
../../gcc/gcc/lto/lto.c:2054
#13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
../../gcc/gcc/toplev.c:1049
#14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

(gdb) where
#0  0x00002b96128fc26c in ?? () from /lib/libc.so.6
#1  0x00002b96128fde24 in calloc () from /lib/libc.so.6
#2  0x0000000000a6ea7a in xcalloc (nelem=19, elsize=8) at
../../gcc/libiberty/xmalloc.c:162
#3  0x000000000099d8c0 in get_loop_body (loop=0x2b966e38ecf0) at
../../gcc/gcc/cfgloop.c:819
#4  0x000000000099e14c in get_loop_exit_edges (loop=0x2b966e38ecf0) at
../../gcc/gcc/cfgloop.c:1157
#5  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
../../gcc/gcc/ira-build.c:155
#6  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#7  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#8  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#9  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
../../gcc/gcc/passes.c:1522
#10 0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
../../gcc/gcc/passes.c:1577
#11 0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
../../gcc/gcc/passes.c:1578
#12 0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
at ../../gcc/gcc/tree-optimize.c:407
#13 0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
../../gcc/gcc/cgraphunit.c:1178
#14 0x00000000007835ed in cgraph_expand_all_functions () at
../../gcc/gcc/cgraphunit.c:1245
#15 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#16 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
../../gcc/gcc/lto/lto.c:2054
#17 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
../../gcc/gcc/toplev.c:1049
#18 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#19 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#20 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#21 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

(gdb) where
#0  0x00002b961290491e in memset () from /lib/libc.so.6
#1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
../../gcc/gcc/ira-build.c:155
#2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
../../gcc/gcc/passes.c:1522
#6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
../../gcc/gcc/passes.c:1577
#7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
../../gcc/gcc/passes.c:1578
#8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
at ../../gcc/gcc/tree-optimize.c:407
#9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
../../gcc/gcc/cgraphunit.c:1178
#10 0x00000000007835ed in cgraph_expand_all_functions () at
../../gcc/gcc/cgraphunit.c:1245
#11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
../../gcc/gcc/lto/lto.c:2054
#13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
../../gcc/gcc/toplev.c:1049
#14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

So it seems to be stuck in a part of the IRA pass ...

Hope this helps (it's close to impossible to build a test case out of
this, because the programs consists of around 3/4 of our ~ 1 million lines
of Fortran code.

I'd recommend to try -fira-region=one and to see what memory requirements
would be.

For such big function the conflict table would be very big.  This is a
common problem for RA using the conflict table.  IRA uses sophisticated
algorithm for conflict table compression.  I even have no idea now how to
improve it.  IRA has a parameter ira-max-conflict-table-size which affects
the decision to use the conflict table.  If the conflict table is decided
not to be used, the quality of RA worsens.  The default value is 1GB.  But I
guess the conflict table in your case would be bigger.  So you need to play
with this parameter.

If -fira-region work for you, we could prohibit regional allocation for
functions containing basic blocks which number is more than some threshold.

Can't we split a function at points of minimal # of life pseudos and allocate the resulting regions independently? Of course there would be hard constraints on the entry of each such region, just like we have on function entry for parameters.

No idea if that would help in practice, of course.
That is an interesting proposal, Richard. I think it could help. There are a lot questions about heuristics (# min number may divide in two very different parts -- one very small and one very big). Also I'd prefer to implement something like non-regional allocation first and than splitting live ranges over regions for pseudos transparent over the region and which got hard-register. On my estimation it could decrease demand for resources for regional allocation a lot.

In any case, the both solutions need sometime in implementation and evaluation. Meanwhile, I'll submit a patch preventing regional allocation where there are a lot of BBs.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]