Summary: | Segfault of gcc on a big file | ||
---|---|---|---|
Product: | gcc | Reporter: | benoit.barbot |
Component: | middle-end | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | normal | CC: | dimhen, hubicka, mpolacek, octoploid, rguenth, steven, webrown.cpp |
Priority: | P3 | Keywords: | compile-time-hog, memory-hog |
Version: | 4.8.0 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | 4.6.3 | |
Known to fail: | 10.2.1, 4.7.2, 4.8.0 | Last reconfirmed: | 2024-02-19 00:00:00 |
Bug Depends on: | |||
Bug Blocks: | 47344 | ||
Attachments: | Collected hacks to make the test case compile in reasonable time with -O0 |
The file is too big to be attached. Here is a URL where you can find it: http://www.lsv.ens-cachan.fr/~barbot/cosmos/files/buggcc.ii Please try newer version of GCC. And if you can still reproduce, please attach here a preprocessed file of manageable size. Thanks. When i remove line in the file the segfault disappear. The size of the file seams to trigger the segfault. I can't reproduce the crash, but what's interesting are the compile times (without optimization just "-c buggcc.ii"): clang++: 26.95 total gcc-4.6.3: 1:39.92 total gcc-4.7.2: 6:04.07 total gcc-4.8 : 7:16.84 total I try the same file but on a different computer with a newer version of gcc(gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3) with the same problem:
>g++ buggcc.ii
g++: erreur interne du compilateur: Processus arrêté (program cc1plus)
(In reply to comment #6) > I try the same file but on a different computer with a newer version of gcc(gcc > (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3) with the same problem: > >g++ buggcc.ii > g++: erreur interne du compilateur: Processus arrêté (program cc1plus) Make sure you have enough RAM (over 4GB) to compile this testcase. Check your dmesg for the OOM-killer. I can't reproduce the crash either. I'm not sure if we should keep the PR open as regards compile-time performance issues (or we have already a similar testcase in Bugzilla?) Steven? I first tried at -O0, only to run into even worse compile time issues, hitting quadratic behavior in the number of EH regions, and having a huge number of them: void LHA::Load();; remove_queued_eh_handlers: # eh regions: 179972 The remove_queued_eh_handlers function is new, I'll attach a patch here after proper testing. With that problem out of the way, the next hurdle is IRA but I'm still trying to figure that one out. Created attachment 29557 [details]
Collected hacks to make the test case compile in reasonable time with -O0
Patch does 2 things:
- Queue up to-be-removed EH regions, instead of removing them one-by-one.
Removing them one at a time results in walking the list of EH regions
repeatedly, thus taking O(# of EH regions ** 2) time.
- Rewrite init_subregs_of_mode and subroutines to first collect the
invalid mode change subregs in sbitmaps, and then converting the final
sbitmap to a bitmap. This trades memory for time: the bitmap lookups are
also potentially O(# of registers ** 2) and this test case has more than
one million registers, many of them with invalid mode changes (to be fixed
up by IRA/LRA).
Peak memory at -O0 is <4GB. Compile time on a "Quad-Core AMD Opteron(tm) Processor 8354" at 2200MHz is 240s, half of it still taken up by IRA+LRA.
At -O1 the einline pass is consuming almost all compile time again.
-> Honza: Can we please have a proper permanent fix for this recurring
problem? What's there now just Does Not Work!
This is a regression on various things from previous releases. I will take care of the compile time explosion at -O0. The -O1+ compile time explosion (and probably the memory explosion) are due to the ever-changing inliner heuristics that still just don't scale. Last night's compilation at -O1 with my hacks applied finished after a whopping >6 hours. Top compile time consumers: early inlining heuristics: 12409.92 (55%) usr integration : 1417.65 ( 6%) usr tree eh : 1140.75 ( 5%) usr tree PTA : 309.46 ( 1%) usr tree SSA incremental : 6065.43 (27%) usr tree split crit edges : 515.67 ( 2%) usr TOTAL : 22448.04 Peak memory: 4294647808 (~4GB). For reference, my numbers are for GCC 4.8 trunk r196182, configured with release checking, on x86_64-unknown-linux-gnu, on AMD Opteron Processor 8354 at 2200MHz. Thanks Steven for analyzing / fixing this. (In reply to comment #10) > Created attachment 29557 [details] > Collected hacks to make the test case compile in reasonable time with -O0 > > Patch does 2 things: > > - Queue up to-be-removed EH regions, instead of removing them one-by-one. > Removing them one at a time results in walking the list of EH regions > repeatedly, thus taking O(# of EH regions ** 2) time. This (properly cleaned up) looks reasonable to me. > - Rewrite init_subregs_of_mode and subroutines to first collect the > invalid mode change subregs in sbitmaps, and then converting the final > sbitmap to a bitmap. This trades memory for time: the bitmap lookups are > also potentially O(# of registers ** 2) and this test case has more than > one million registers, many of them with invalid mode changes (to be fixed > up by IRA/LRA). Hmm - this is because we hit the O(n) complexity we have on our bitmap implementation? Can't we improve init_subregs_of_mode by first collecting all mode changes we see for a pseudo (eventually using DF info?) and then do the processing in some more optimal order? Trading memory O(number of pseudos) with a large constant factor sounds like something waiting for trouble for other testcases ... > Peak memory at -O0 is <4GB. Compile time on a "Quad-Core AMD Opteron(tm) > Processor 8354" at 2200MHz is 240s, half of it still taken up by IRA+LRA. > > At -O1 the einline pass is consuming almost all compile time again. > -> Honza: Can we please have a proper permanent fix for this recurring > problem? What's there now just Does Not Work! (In reply to comment #15) > > - Queue up to-be-removed EH regions, instead of removing them one-by-one. > > Removing them one at a time results in walking the list of EH regions > > repeatedly, thus taking O(# of EH regions ** 2) time. > This (properly cleaned up) looks reasonable to me. It's not yet complete, I think I need to update the outer region pointers for the inner region if an outer region is removed. But I think this is the right approach. > > - Rewrite init_subregs_of_mode and subroutines to first collect the > > invalid mode change subregs in sbitmaps, and then converting the final > > sbitmap to a bitmap. This trades memory for time: the bitmap lookups are > > also potentially O(# of registers ** 2) and this test case has more than > > one million registers, many of them with invalid mode changes (to be fixed > > up by IRA/LRA). > Hmm - this is because we hit the O(n) complexity we have on our bitmap > implementation? Yes. > Can't we improve init_subregs_of_mode by first collecting > all mode changes we see for a pseudo (eventually using DF info?) and > then do the processing in some more optimal order? Yes. That is the plan, this was just a proof-of-concept fix (I didn't call it a patch, I called it a hack - for the good reasons you mentioned :-). I also want to add a better way to lookup bits as random-access in bitmaps: change the "view" of the bitmap, much like what tree-ssa-live does with its maps). >
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55135
>
> --- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-01 07:50:43 UTC ---
> Last night's compilation at -O1 with my hacks applied finished after
> a whopping >6 hours. Top compile time consumers:
>
> early inlining heuristics: 12409.92 (55%) usr
> integration : 1417.65 ( 6%) usr
> tree eh : 1140.75 ( 5%) usr
> tree PTA : 309.46 ( 1%) usr
> tree SSA incremental : 6065.43 (27%) usr
> tree split crit edges : 515.67 ( 2%) usr
> TOTAL : 22448.04
>
> Peak memory: 4294647808 (~4GB).
I will take care of the early inlining problem. I wonder, you don't have oprofile of that, by any chance?
Honza
> I will take care of the early inlining problem. I wonder, you don't have oprofile of that, by any chance?
Aha, callee walking in update_inline_summary. Perhaps I will really need to
make this incremental despite the risk of accmulating roundoff errors.. I will
play with this a bit more.
Honza
(In reply to comment #18) I thought you had already done that, to handle attribute flatten for bug 54146 (http://gcc.gnu.org/PR54146#c43). This test case doesn't use the flatten attribute, but explodes in the same way as the test case for bug 54146. (In reply to comment #15) > Trading memory O(number of pseudos) with a large constant factor sounds > like something waiting for trouble for other testcases ... FWIW, for this particular test case, almost all registers end up in the set. I'm not sure why. And since there are NUM_MACHINE_MODES bits per register (NUM_MACHINE_MODES==87 on x86) the supposed-to-be sparse bitmaps are huge (>100,000 bitmap_elements). I have a fix proper for this problem in testing. Author: steven Date: Tue Mar 5 14:45:23 2013 New Revision: 196464 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196464 Log: gcc/ PR c++/55135 * except.h (remove_unreachable_eh_regions): New prototype. * except.c (remove_eh_handler_splicer): New function, split out of remove_eh_handler. (remove_eh_handler): Use remove_eh_handler_splicer. Add comment warning about running it on many EH regions one at a time. (remove_unreachable_eh_regions_worker): New function, walk the EH tree in depth-first order and remove non-marked regions. (remove_unreachable_eh_regions): New function. * tree-eh.c (mark_reachable_handlers): New function, split out from remove_unreachable_handlers. (remove_unreachable_handlers): Use mark_reachable_handlers and remove_unreachable_eh_regions. (remove_unreachable_handlers_no_lp): Use mark_reachable_handlers and remove_unreachable_eh_regions. Modified: trunk/gcc/ChangeLog trunk/gcc/except.c trunk/gcc/except.h trunk/gcc/tree-eh.c (In reply to comment #20) > I have a fix proper for this problem in testing. Posted for discussion here: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00193.html PR47344 tracks the regression property of this bug. (In reply to comment #23) > PR47344 tracks the regression property of this bug. ?! This is also a regression from GCC 4.6 (commen #5), how in the world does that qualify as an "old regression"? (NB 4.6.3 known to work w.r.t. comment #5, not w.r.t. original bug report) On Wed, 6 Mar 2013, steven at gcc dot gnu.org wrote:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55135
>
> Steven Bosscher <steven at gcc dot gnu.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Known to work| |4.6.3
> Known to fail| |4.7.2, 4.8.0
>
> --- Comment #24 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 12:09:26 UTC ---
> (In reply to comment #23)
> > PR47344 tracks the regression property of this bug.
>
> ?! This is also a regression from GCC 4.6 (commen #5), how in the world
> does that qualify as an "old regression"?
Ah, just because nobody has tried 4.5 doesn't say it isn't a regression
in 4.6!
(what is a regression in compile-time / memory-usage? technically
I'd say if T2 > T1 or M2 > M1 it's a regression ... welcome to
the world of an ever increasing number of open "regressions")
Btw, I wouldn't call
> gcc-4.6.3: 1:39.92 total
"work" ;) Also the reporter says the bug is in 4.4.5 (so we are again
turning a bug into a different bug ... :/)
(In reply to comment #22) > Posted for discussion here: > http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00193.html OT: Another trivial speed-up for bitmaps used as regsets (and probably in general) is to look at head->first if head->current is not the element containing the sought bit, and *not* update head->current if head->first is the right element. This speeds up regsets because a common access pattern is to look at sets containing both pseudos and hardregs, and on most targets all hardregs are in head->first. Not updating head->current preserves a pointer to the latest accessed pseudos. I'll implement this idea and come back with some timings. On Wed, 6 Mar 2013, steven at gcc dot gnu.org wrote:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55135
>
> --- Comment #28 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-06 12:18:01 UTC ---
> (In reply to comment #22)
> > Posted for discussion here:
> > http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00193.html
>
> OT: Another trivial speed-up for bitmaps used as regsets (and probably
> in general) is to look at head->first if head->current is not the element
> containing the sought bit, and *not* update head->current if head->first
> is the right element. This speeds up regsets because a common access
> pattern is to look at sets containing both pseudos and hardregs, and on
> most targets all hardregs are in head->first. Not updating head->current
> preserves a pointer to the latest accessed pseudos.
>
> I'll implement this idea and come back with some timings.
Indeed a nice idea ;) I suppose ->current should only be updated
if its new distance to head->first is bigger than <magic number>
(and zero is of course an obvious one)
Reconfirmed that we still take ages to build the testcase (early inliner is still running for me) The early inliner issue here is caused by tree-inline removing individual clones one by one. Each time a clone is removed a new clone becomes a root of the clone tree and it takes long time to update all pointers. GCC 10.2 takes 1.8s and ~500MB for parsing (-fsyntax-only), compiling at -O0 is 17.8s peaking at ~3GB of memory. Time variable usr sys wall GGC phase parsing : 1.82 ( 10%) 0.71 ( 59%) 2.53 ( 13%) 516872 kB ( 29%) phase opt and generate : 15.96 ( 90%) 0.48 ( 40%) 16.44 ( 87%) 1273663 kB ( 71%) callgraph ipa passes : 0.91 ( 5%) 0.15 ( 13%) 1.07 ( 6%) 198535 kB ( 11%) df scan insns : 1.00 ( 6%) 0.18 ( 15%) 1.20 ( 6%) 19 kB ( 0%) expand : 0.92 ( 5%) 0.05 ( 4%) 0.95 ( 5%) 308140 kB ( 17%) integrated RA : 3.98 ( 22%) 0.06 ( 5%) 4.02 ( 21%) 129194 kB ( 7%) LRA non-specific : 1.04 ( 6%) 0.01 ( 1%) 1.02 ( 5%) 868 kB ( 0%) TOTAL : 17.79 1.20 19.00 1796092 kB that's everything >= 5% At -O1 things blow up as expected: Time variable usr sys wall GGC callgraph functions expansion : 335.20 ( 14%) 2.31 ( 51%) 337.66 ( 14%) 1570346 kB ( 41%) callgraph ipa passes :2040.93 ( 86%) 1.43 ( 31%)2042.99 ( 86%) 1326596 kB ( 34%) ipa inlining heuristics :1505.86 ( 63%) 0.01 ( 0%)1506.33 ( 63%) 146038 kB ( 4%) integration : 637.70 ( 27%) 1.61 ( 35%) 639.78 ( 27%) 968497 kB ( 25%) tree SSA incremental : 110.12 ( 5%) 0.00 ( 0%) 110.16 ( 5%) 410 kB ( 0%) TOTAL :2379.80 4.54 2385.13 3854592 kB and ~4GB of ram used. Honza - you put the finger to it, can't we refactor things so we apply this update in the caller after all stmts were processed? > > Honza - you put the finger to it, can't we refactor things so we apply > this update in the caller after all stmts were processed? The clone tree issue should not be too hard to solve. We need to keep the top of tree as a deleted symbol rather then actual inline clone. I plan to take a look. It was I think only inliner related compile time issue I gave up on last stage2 because it was bit exotic. I plan to look into this again soon and also change way we materialize clones at the beggining of build and do that on demand instead. This requires to cleanup some of old logic deciding on when function body is stil needed for compilation. Honza > > -- > You are receiving this mail because: > You are on the CC list for the bug. Still as comment#31 says. ipa inlining heuristics : 813.42 ( 67%) integration : 228.18 ( 19%) TOTAL :1219.52 7.03 1226.86 3438M |
When i compile the given file with gcc, i obtained a segfault: >gcc LHA.ii -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.4.5 (Debian 4.4.5-8) COLLECT_GCC_OPTIONS='-v' '-mtune=generic' /usr/lib/gcc/x86_64-linux-gnu/4.4.5/cc1plus -fpreprocessed LHA.ii -quiet -dumpbase LHA.ii -mtune=generic -auxbase LHA -version -o /tmp/ccDvgh6v.s GNU C++ (Debian 4.4.5-8) version 4.4.5 (x86_64-linux-gnu) compiled by GNU C version 4.4.5, GMP version 4.3.2, MPFR version 3.0.0-p3. GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 5a2e15051eaa06a84cf6320b754ba993 gcc: Internal error: Erreur de segmentation (program cc1plus) The code in LHA.ii is generated which explained why it is so big. The compilation work with similar but smaller file generated by the same program. It also fail on similar and bigger file generated by the same program If the crash is due to the size of the file gcc should return an error instead of a segfault.