While compiling the NES emulator FCE Ultra for my new Fedora 9 system (x86-64), I noticed that gcc 4.3.0 (or at least Fedora's version of it) used so much memory on some of the files that my system almost crashed. The preprocessed source code that trigger this bug is attached here. This seems to be a regression, as I have not been able to reproduce it on Fedora's gcc 3.4.6 (compat-gcc-34-3.4.6-9.x86_64), nor on gcc 4.1.2 on my old system (Slackware 12). I search for other memory related bugs in gcc 4.3, and found a couple that mention similar problems, so it's possible (or even probable) that this is a duplicate. I'm not sure, though, so I've created this bug report just so you can take a quick look at the problem and dismiss it if there's nothing new here. Sorry about the length of the code that reproduces the bug, by the way. It's a 20 thousand line file with, what looks like, machine-generated C code.
Created attachment 15653 [details] Uses gigabytes of memory when compiled with optimizations on 4.3.0
Confirmed. 4.3.1 -O: 177MB 4.3.1 -O2: 1.3GB 4.3.1 -O2 -fno-tree-vrp: 230MB 4.2.3 -O2: 230MB tree VRP : 42.06 (55%) usr 1.33 (53%) sys 43.52 (55%) wall 2319231 kB (94%) ggc whoooo ;) Looks like sth new.
This testcase has 1025 loops we create pre-headers for. We insert tons of asserts for non-NULL pointers due to dereferences (>3074) which all cause new PHI nodes to be registered during insertion (and nearly the whole program is rewritten). But most of the time/memory is probably spent within SCEV analysis (indeed, starting with 4.3 we reset the SCEV cache on each invocation of adjust_range_with_scev). From detailed mem-report we see that also reassociation is causing quite some garbage: tree-phinodes.c:155 (allocate_phi_node) 11491552: 0.5% 0: 0.0% 0: 0.0% 454880: 0.4% 29364 fold-const.c:6254 (extract_muldiv_1) 35618816: 1.7% 0: 0.0% 0: 0.0% 0: 0.0% 556544 fold-const.c:2516 (fold_convert) 36274032: 1.7% 0: 0.0% 0: 0.0% 4030448: 4.0% 503806 fold-const.c:7473 (fold_plusminus_mult_expr) 45185024: 2.1% 0: 0.0% 0: 0.0% 0: 0.0% 706016 fold-const.c:1592 (associate_trees) 131797504: 6.2% 0: 0.0% 0: 0.0% 0: 0.0% 2059336 fold-const.c:9743 (fold_binary) 131862976: 6.2% 0: 0.0% 0: 0.0% 0: 0.0% 2060359 tree-chrec.h:149 (build_polynomial_chrec) 183160824: 8.6% 0: 0.0% 0: 0.0% 16650984:16.4% 2081373 tree-chrec.h:149 (build_polynomial_chrec) 198801152: 9.3% 0: 0.0% 0: 0.0% 18072832:17.8% 2259104 fold-const.c:1453 (negate_expr) 445240224:20.9% 0: 0.0% 0: 0.0% 49471136:48.6% 6183892 tree-chrec.c:325 (chrec_fold_plus_1) 782040448:36.7% 0: 0.0% 0: 0.0% 0: 0.0% 12219382 Total 2132008624 486260114 4194416 101771994 59795791 source location Garbage Freed Leak Overhead Times
Collecting after clearing the SCEV cache brings down peak memory usage to about 450MB, the question is whether this is safe.
It is not safe. Probably the best thing would be to not ask SCEV during the propagation but instead at ASSERT_EXPR insertion time.
Fixed for 4.4.0.
Subject: Bug 36262 Author: rguenth Date: Sat May 31 13:01:10 2008 New Revision: 136237 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136237 Log: 2008-05-31 Richard Guenther <rguenther@suse.de> PR tree-optimization/34244 * fold-const.c (tree_expr_nonnegative_warnv_p): Do not ask VRP. (tree_expr_nonzero_warnv_p): Likewise. * tree-vrp.c (vrp_expr_computes_nonnegative): Call ssa_name_nonnegative_p. (vrp_expr_computes_nonzero): Call ssa_name_nonzero_p. (extract_range_from_unary_expr): Use vrp_expr_computes_nonzero, not tree_expr_nonzero_warnv_p. PR tree-optimization/36262 Revert 2007-11-29 Zdenek Dvorak <ook@ucw.cz> PR tree-optimization/34244 * tree-vrp.c (adjust_range_with_scev): Clear scev cache. (record_numbers_of_iterations): New function. (execute_vrp): Cache the numbers of iterations of loops. * tree-scalar-evolution.c (scev_reset_except_niters): New function. (scev_reset): Use scev_reset_except_niters. * tree-scalar-evolution.h (scev_reset_except_niters): Declare. Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/tree-scalar-evolution.c trunk/gcc/tree-scalar-evolution.h trunk/gcc/tree-vrp.c
4.3.1 is being released, adjusting target milestone.
Fixed.
Subject: Bug 36262 Author: rguenth Date: Fri Jun 6 20:06:40 2008 New Revision: 136501 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136501 Log: 2008-06-06 Richard Guenther <rguenther@suse.de> PR tree-optimization/34244 * fold-const.c (tree_expr_nonnegative_warnv_p): Do not ask VRP. (tree_expr_nonzero_warnv_p): Likewise. * tree-vrp.c (vrp_expr_computes_nonnegative): Call ssa_name_nonnegative_p. (vrp_expr_computes_nonzero): Call ssa_name_nonzero_p. (extract_range_from_unary_expr): Use vrp_expr_computes_nonzero, not tree_expr_nonzero_warnv_p. PR tree-optimization/36262 Revert 2007-11-29 Zdenek Dvorak <ook@ucw.cz> PR tree-optimization/34244 * tree-vrp.c (adjust_range_with_scev): Clear scev cache. (record_numbers_of_iterations): New function. (execute_vrp): Cache the numbers of iterations of loops. * tree-scalar-evolution.c (scev_reset_except_niters): New function. (scev_reset): Use scev_reset_except_niters. * tree-scalar-evolution.h (scev_reset_except_niters): Declare. Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/fold-const.c branches/gcc-4_3-branch/gcc/tree-scalar-evolution.c branches/gcc-4_3-branch/gcc/tree-scalar-evolution.h branches/gcc-4_3-branch/gcc/tree-vrp.c
It looks like we don't use a known number of loop iterations at all anymore after this patch.
I don't think we "used" it before either? Still the _computing_ of niters can be easily re-instantiated - it wasn't the expensive thing here. But I had the impression SCEV computes niters itself when needed, so the removal of the upfront computation was just an "optimization". Note that Zdenek added it to not do this expensive thing multiple times.
Author: rguenth Date: Wed Sep 4 07:27:42 2019 New Revision: 275365 URL: https://gcc.gnu.org/viewcvs?rev=275365&root=gcc&view=rev Log: 2019-09-04 Richard Biener <rguenther@suse.de> PR rtl-optimization/36262 * postreload-gcse.c: Include intl.h and gcse.h. (insert_expr_in_table): Insert at the head of cur_expr->avail_occr to avoid linear list walk. (record_last_mem_set_info): Gate off if not computing transparentness. (get_bb_avail_insn): If transparentness isn't computed give up early. (gcse_after_reload_main): Skip compute_transp and extended PRE if gcse_or_cprop_is_too_expensive says so. Modified: trunk/gcc/ChangeLog trunk/gcc/postreload-gcse.c
Hi Richard, r275365 is causing regressions on aarch64: FAIL: gcc.dg/atomic/stdatomic-compare-exchange-4.c -O3 -g execution test FAIL: gcc.dg/tree-prof/20050826-2.c execution, -fprofile-use -D_PROFILE_USE In addition, on arm: FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution, -O3 -g
On Mon, 9 Sep 2019, clyon at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36262 > > Christophe Lyon <clyon at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |clyon at gcc dot gnu.org > Known to work| | > Known to fail| | > > --- Comment #14 from Christophe Lyon <clyon at gcc dot gnu.org> --- > Hi Richard, > > r275365 is causing regressions on aarch64: > FAIL: gcc.dg/atomic/stdatomic-compare-exchange-4.c -O3 -g execution test > FAIL: gcc.dg/tree-prof/20050826-2.c execution, -fprofile-use -D_PROFILE_USE > > In addition, on arm: > FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution, -O3 -g Wrong bugzilla? But also should be fixed by the followup. 2019-09-05 Richard Biener <rguenther@suse.de> PR rtl-optimization/91656 * postreload-gcse.c (record_last_mem_set_info): Revert addition of early out.
> Wrong bugzilla? But also should be fixed by the followup. I replied to the bugzilla mentioned in the ChangeLog... > > 2019-09-05 Richard Biener <rguenther@suse.de> > > PR rtl-optimization/91656 > * postreload-gcse.c (record_last_mem_set_info): Revert addition > of early out. But yes indeed, this fixes what I reported, sorry for the noise.