This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/80960] [5/6/7/8 Regression] Huge memory use when compiling a very large test case
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 06 Jun 2017 08:36:57 +0000
- Subject: [Bug middle-end/80960] [5/6/7/8 Regression] Huge memory use when compiling a very large test case
- Auto-submitted: auto-generated
- References: <bug-80960-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mliska at suse dot cz,
| |segher at gcc dot gnu.org
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
The first "bisection" possibly points at r190594 which limited the work FRE
does (with the effect of leaving some things unoptimized)
More precise bisection would be appreciated.
I see with GCC 7.1 and -O1 (recommended for machine-generated code) a use of
3.7GB
of ram.
The code contains a very large basic-block.
I do remember compile-time/memory-hog PRs for this code style.
compile-time analysis using perf highlights:
Samples: 680K of event 'cycles:pp', Event count (approx.): 607299351135
Overhead Command Shared Object Symbol
◆
26.58% f951 f951 [.] refers_to_regno_p
▒
9.64% f951 f951 [.] reg_overlap_mentioned_p
▒
7.08% f951 f951 [.] find_hard_regno_for_1
▒
4.42% f951 f951 [.] reg_used_between_p
▒
1.90% f951 f951 [.] get_last_value_validate
which probably means we're doing some quadratic amount of work on use->def
chains inside the BB. With call traces:
+ 48.54% 1.49% f951 f951 [.] try_combine
- 25.59% 25.55% f951 f951 [.] refers_to_regno_p
▒
- refers_to_regno_p
▒
- 16.98% reg_overlap_mentioned_p
▒
- 16.00% reg_used_between_p
▒
can_combine_p
▒
try_combine
...
- 4.69% refers_to_regno_p
▒
- 4.66% reg_overlap_mentioned_p
▒
- 4.36% reg_used_between_p
▒
can_combine_p
▒
try_combine
so it's combine (at least at -O1) and I can also imagine that's using up the
memory in its attempts to simplify & match up stuff as it uses GC memory
for all the copying that involves IIRC. Segher?
int
reg_used_between_p (const_rtx reg, const rtx_insn *from_insn,
const rtx_insn *to_insn)
{
rtx_insn *insn;
if (from_insn == to_insn)
return 0;
for (insn = NEXT_INSN (from_insn); insn != to_insn; insn = NEXT_INSN (insn))
if (NONDEBUG_INSN_P (insn)
&& (reg_overlap_mentioned_p (reg, PATTERN (insn))
|| (CALL_P (insn) && find_reg_fusage (insn, USE, reg))))
return 1;
return 0;
}
so that just walks the BB instead of, say, using DF uses (if available during
combine), or somehow recording "distance" between two rtx_insns to be able
to cap the amount of work done (and conservatively return true). After all
it's going to end up combining very "distant" instructions here (remember,
gigantic basic-block).