36262 – [4.3 Regression] Extreme memory usage of VRP compared to older versions

Bug 36262 - [4.3 Regression] Extreme memory usage of VRP compared to older versions

Summary: [4.3 Regression] Extreme memory usage of VRP compared to older versions

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.3.0

Importance:	P2 normal
Target Milestone:	4.3.2
Assignee:	Richard Biener

URL:
Keywords:	memory-hog

Depends on:
Blocks:	34244
	Show dependency tree / graph

Reported:	2008-05-18 23:05 UTC by Haakon Riiser
Modified:	2019-09-09 13:58 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:	4.2.3, 4.4.0
Known to fail:	4.3.0, 4.3.1
Last reconfirmed:	2008-05-27 08:32:02

Attachments
Uses gigabytes of memory when compiled with optimizations on 4.3.0 (5.85 KB, text/plain) 2008-05-18 23:07 UTC, Haakon Riiser	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Haakon Riiser 2008-05-18 23:05:52 UTC

While compiling the NES emulator FCE Ultra for my new Fedora 9 system (x86-64), I noticed that gcc 4.3.0 (or at least Fedora's version of it) used so much memory on some of the files that my system almost crashed.  The preprocessed source code that trigger this bug is attached here.  This seems to be a regression, as I have not been able to reproduce it on Fedora's gcc 3.4.6 (compat-gcc-34-3.4.6-9.x86_64), nor on gcc 4.1.2 on my old system (Slackware 12).

I search for other memory related bugs in gcc 4.3, and found a couple that mention similar problems, so it's possible (or even probable) that this is a duplicate.  I'm not sure, though, so I've created this bug report just so you can take a quick look at the problem and dismiss it if there's nothing new here.

Sorry about the length of the code that reproduces the bug, by the way.  It's a 20 thousand line file with, what looks like, machine-generated C code.

Comment 1 Haakon Riiser 2008-05-18 23:07:09 UTC

Created attachment 15653 [details]
Uses gigabytes of memory when compiled with optimizations on 4.3.0

Comment 2 Richard Biener 2008-05-19 09:18:21 UTC

Confirmed.

4.3.1 -O: 177MB
4.3.1 -O2: 1.3GB
4.3.1 -O2 -fno-tree-vrp: 230MB
4.2.3 -O2: 230MB

 tree VRP              :  42.06 (55%) usr   1.33 (53%) sys  43.52 (55%) wall 2319231 kB (94%) ggc

whoooo ;)  Looks like sth new.

Comment 3 Richard Biener 2008-05-19 09:42:32 UTC

This testcase has 1025 loops we create pre-headers for.  We insert tons of
asserts for non-NULL pointers due to dereferences (>3074) which all cause
new PHI nodes to be registered during insertion (and nearly the whole program
is rewritten).  But most of the time/memory is probably spent within SCEV
analysis (indeed, starting with 4.3 we reset the SCEV cache on each
invocation of adjust_range_with_scev).

From detailed mem-report we see that also reassociation is causing quite some
garbage:

tree-phinodes.c:155 (allocate_phi_node)            11491552: 0.5%          0: 0.0%          0: 0.0%     454880: 0.4%      29364
fold-const.c:6254 (extract_muldiv_1)               35618816: 1.7%          0: 0.0%          0: 0.0%          0: 0.0%     556544
fold-const.c:2516 (fold_convert)                   36274032: 1.7%          0: 0.0%          0: 0.0%    4030448: 4.0%     503806
fold-const.c:7473 (fold_plusminus_mult_expr)       45185024: 2.1%          0: 0.0%          0: 0.0%          0: 0.0%     706016
fold-const.c:1592 (associate_trees)               131797504: 6.2%          0: 0.0%          0: 0.0%          0: 0.0%    2059336
fold-const.c:9743 (fold_binary)                   131862976: 6.2%          0: 0.0%          0: 0.0%          0: 0.0%    2060359
tree-chrec.h:149 (build_polynomial_chrec)         183160824: 8.6%          0: 0.0%          0: 0.0%   16650984:16.4%    2081373
tree-chrec.h:149 (build_polynomial_chrec)         198801152: 9.3%          0: 0.0%          0: 0.0%   18072832:17.8%    2259104
fold-const.c:1453 (negate_expr)                   445240224:20.9%          0: 0.0%          0: 0.0%   49471136:48.6%    6183892
tree-chrec.c:325 (chrec_fold_plus_1)              782040448:36.7%          0: 0.0%          0: 0.0%          0: 0.0%   12219382
Total                                            2132008624        486260114          4194416        101771994         59795791
source location                                     Garbage            Freed             Leak         Overhead            Times

Comment 4 Richard Biener 2008-05-19 11:00:33 UTC

Collecting after clearing the SCEV cache brings down peak memory usage to about 450MB, the question is whether this is safe.

Comment 5 Richard Biener 2008-05-20 13:00:41 UTC

It is not safe.  Probably the best thing would be to not ask SCEV during the
propagation but instead at ASSERT_EXPR insertion time.

Comment 6 Richard Biener 2008-05-31 13:01:57 UTC

Fixed for 4.4.0.

Comment 7 Richard Biener 2008-05-31 13:01:58 UTC

Subject: Bug 36262

Author: rguenth
Date: Sat May 31 13:01:10 2008
New Revision: 136237

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136237
Log:
2008-05-31  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/34244
	* fold-const.c (tree_expr_nonnegative_warnv_p): Do not ask VRP.
	(tree_expr_nonzero_warnv_p): Likewise.
	* tree-vrp.c (vrp_expr_computes_nonnegative): Call
	ssa_name_nonnegative_p.
	(vrp_expr_computes_nonzero): Call ssa_name_nonzero_p.
	(extract_range_from_unary_expr): Use vrp_expr_computes_nonzero,
	not tree_expr_nonzero_warnv_p.

	PR tree-optimization/36262
	Revert
	2007-11-29  Zdenek Dvorak  <ook@ucw.cz>

        PR tree-optimization/34244
        * tree-vrp.c (adjust_range_with_scev): Clear scev cache.
        (record_numbers_of_iterations): New function.
        (execute_vrp): Cache the numbers of iterations of loops.
        * tree-scalar-evolution.c (scev_reset_except_niters):
        New function.
        (scev_reset): Use scev_reset_except_niters.
        * tree-scalar-evolution.h (scev_reset_except_niters): Declare.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/tree-scalar-evolution.c
    trunk/gcc/tree-scalar-evolution.h
    trunk/gcc/tree-vrp.c

Comment 8 Richard Biener 2008-06-06 14:59:24 UTC

4.3.1 is being released, adjusting target milestone.

Comment 9 Richard Biener 2008-06-06 20:07:05 UTC

Fixed.

Comment 10 Richard Biener 2008-06-06 20:07:25 UTC

Subject: Bug 36262

Author: rguenth
Date: Fri Jun  6 20:06:40 2008
New Revision: 136501

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136501
Log:
2008-06-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/34244
	* fold-const.c (tree_expr_nonnegative_warnv_p): Do not ask VRP.
	(tree_expr_nonzero_warnv_p): Likewise.
	* tree-vrp.c (vrp_expr_computes_nonnegative): Call
	ssa_name_nonnegative_p.
	(vrp_expr_computes_nonzero): Call ssa_name_nonzero_p.
	(extract_range_from_unary_expr): Use vrp_expr_computes_nonzero,
	not tree_expr_nonzero_warnv_p.

	PR tree-optimization/36262
	Revert
	2007-11-29  Zdenek Dvorak  <ook@ucw.cz>

        PR tree-optimization/34244
        * tree-vrp.c (adjust_range_with_scev): Clear scev cache.
        (record_numbers_of_iterations): New function.
        (execute_vrp): Cache the numbers of iterations of loops.
        * tree-scalar-evolution.c (scev_reset_except_niters):
        New function.
        (scev_reset): Use scev_reset_except_niters.
        * tree-scalar-evolution.h (scev_reset_except_niters): Declare.

Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/fold-const.c
    branches/gcc-4_3-branch/gcc/tree-scalar-evolution.c
    branches/gcc-4_3-branch/gcc/tree-scalar-evolution.h
    branches/gcc-4_3-branch/gcc/tree-vrp.c

Comment 11 Steven Bosscher 2008-07-06 09:59:42 UTC

It looks like we don't use a known number of loop iterations at all anymore after this patch.

Comment 12 Richard Biener 2008-07-06 12:27:36 UTC

I don't think we "used" it before either?  Still the _computing_ of niters
can be easily re-instantiated - it wasn't the expensive thing here.  But I
had the impression SCEV computes niters itself when needed, so the removal
of the upfront computation was just an "optimization".  Note that Zdenek
added it to not do this expensive thing multiple times.

Comment 13 Richard Biener 2019-09-04 07:28:14 UTC

Author: rguenth
Date: Wed Sep  4 07:27:42 2019
New Revision: 275365

URL: https://gcc.gnu.org/viewcvs?rev=275365&root=gcc&view=rev
Log:
2019-09-04  Richard Biener  <rguenther@suse.de>

	PR rtl-optimization/36262
	* postreload-gcse.c: Include intl.h and gcse.h.
	(insert_expr_in_table): Insert at the head of cur_expr->avail_occr
	to avoid linear list walk.
	(record_last_mem_set_info): Gate off if not computing transparentness.
	(get_bb_avail_insn): If transparentness isn't computed give up
	early.
	(gcse_after_reload_main): Skip compute_transp and extended PRE
	if gcse_or_cprop_is_too_expensive says so.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/postreload-gcse.c

Comment 14 Christophe Lyon 2019-09-09 13:50:15 UTC

Hi Richard,

r275365 is causing regressions on aarch64:
FAIL: gcc.dg/atomic/stdatomic-compare-exchange-4.c   -O3 -g  execution test
FAIL: gcc.dg/tree-prof/20050826-2.c execution,    -fprofile-use -D_PROFILE_USE

In addition, on arm:
FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O3 -g

Comment 15 rguenther@suse.de 2019-09-09 13:53:38 UTC

On Mon, 9 Sep 2019, clyon at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36262
> 
> Christophe Lyon <clyon at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |clyon at gcc dot gnu.org
>       Known to work|                            |
>       Known to fail|                            |
> 
> --- Comment #14 from Christophe Lyon <clyon at gcc dot gnu.org> ---
> Hi Richard,
> 
> r275365 is causing regressions on aarch64:
> FAIL: gcc.dg/atomic/stdatomic-compare-exchange-4.c   -O3 -g  execution test
> FAIL: gcc.dg/tree-prof/20050826-2.c execution,    -fprofile-use -D_PROFILE_USE
> 
> In addition, on arm:
> FAIL: gcc.c-torture/execute/builtins/pr23484-chk.c execution,  -O3 -g

Wrong bugzilla?  But also should be fixed by the followup.

2019-09-05  Richard Biener  <rguenther@suse.de>

        PR rtl-optimization/91656
        * postreload-gcse.c (record_last_mem_set_info): Revert addition
        of early out.

Comment 16 Christophe Lyon 2019-09-09 13:58:28 UTC

> Wrong bugzilla?  But also should be fixed by the followup.
I replied to the bugzilla mentioned in the ChangeLog...

> 
> 2019-09-05  Richard Biener  <rguenther@suse.de>
> 
>         PR rtl-optimization/91656
>         * postreload-gcse.c (record_last_mem_set_info): Revert addition
>         of early out.

But yes indeed, this fixes what I reported, sorry for the noise.