Bug 90273 - [10 Regression] GCC runs out of memory building Firefox
Summary: [10 Regression] GCC runs out of memory building Firefox
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: debug (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: 9.0
Assignee: Richard Biener
URL:
Keywords: build, memory-hog
Depends on:
Blocks: mozillametabug
  Show dependency treegraph
 
Reported: 2019-04-28 20:57 UTC by Jan Hubicka
Modified: 2019-05-02 11:17 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-04-29 00:00:00


Attachments
debug stmt DCE (581 bytes, patch)
2019-04-29 07:11 UTC, Richard Biener
Details | Diff
more aggressive variant (553 bytes, patch)
2019-04-29 07:30 UTC, Richard Biener
Details | Diff
updated patch (657 bytes, patch)
2019-04-29 10:50 UTC, Richard Biener
Details | Diff
removed-unused-local patch (728 bytes, patch)
2019-04-29 11:30 UTC, Richard Biener
Details | Diff
Doing the removal in find_obviously_necessary_stmts (1.06 KB, text/plain)
2019-04-29 11:37 UTC, rsandifo@gcc.gnu.org
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2019-04-28 20:57:28 UTC
Running
/aux/hubicka/9-install/bin/g++ Unified_cpp_dom_events0-8.ii -c -flto -flifetime-dse=1 -fPIC -fstack-protector-strong -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wc++1z-compat -Wduplicated-cond -Wimplicit-fallthrough -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-error=multistatement-macros -Wno-error=class-memaccess  -Wformat -Wformat-overflow=2 -fno-sized-deallocation -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-exceptions -fno-strict-aliasing -fno-rtti -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O2 -fno-omit-frame-pointer -funwind-tables -Wno-error=shadow

eventually runs out of memory on my machine, while GCC 8 finishes rather quickly.
Most of time is spent in gimple_copy.

Without debug info the file builds for me.
Comment 1 Eric Gallager 2019-04-28 21:29:57 UTC
I see -flto in there, so making this block the mozillametabug
Comment 3 Jan Hubicka 2019-04-28 21:44:02 UTC
Note that it fails without -flto as well. Without -g it builds&works for me.

We end up with huge BB containing
# DEBUG aFirst => NULL
# DEBUG aArgs#0 => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG aArgs#0 => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG aArgs#0 => NULL
# DEBUG aArgs#1 => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG aArgs#0 => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL

Seems that these statements are being early inlined and duplicated indefinitely. Something needs to cut them to sensible limit.
Comment 4 Jan Hubicka 2019-04-28 21:54:08 UTC
The code is:
  inline bool IsNodeInternal() const { return false; }

  template <typename First, typename... Args>
  inline bool IsNodeInternal(First aFirst, Args... aArgs) const {
    return mNodeInfo->Equals(aFirst) || IsNodeInternal(aArgs...);
  }

 public:
  inline bool IsHTMLElement() const {
    return IsElement() && IsInNamespace(3);
  }

  inline bool IsHTMLElement(const nsAtom* aTag) const {
    return IsElement() && mNodeInfo->Equals(aTag, 3);
  }

  template <typename First, typename... Args>
  inline bool IsAnyOfHTMLElements(First aFirst, Args... aArgs) const {
    return IsHTMLElement() && IsNodeInternal(aFirst, aArgs...);
  }

$ grep ";; Function nsINode::IsNodeInternal" *cfg2 | less
;; Function nsINode::IsNodeInternal (_ZNK7nsINode14IsNodeInternalEv, funcdef_no=13668, decl_uid=274623, cgraph_uid=7143, symbol_order=7869)
;; Function nsINode::IsNodeInternal<nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJEEEbT_DpT0_, funcdef_no=58886, decl_uid=1359969, cgraph_uid=49491, symbol_order=50680)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_EEEbT_DpT0_, funcdef_no=58870, decl_uid=1359893, cgraph_uid=49475, symbol_order=50664)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_S2_EEEbT_DpT0_, funcdef_no=58853, decl_uid=1359814, cgraph_uid=49458, symbol_order=50647)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_S2_S2_EEEbT_DpT0_, funcdef_no=58835, decl_uid=1359727, cgraph_uid=49440, symbol_order=50629)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_S2_S2_S2_EEEbT_DpT0_, funcdef_no=58816, decl_uid=1359637, cgraph_uid=49421, symbol_order=50610)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_S2_S2_S2_S2_EEEbT_DpT0_, funcdef_no=58796, decl_uid=1359532, cgraph_uid=49401, symbol_order=50590)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_S2_S2_S2_S2_S2_EEEbT_DpT0_, funcdef_no=58774, decl_uid=1359424, cgraph_uid=49379, symbol_order=50568)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_S2_S2_S2_S2_S2_S2_EEEbT_DpT0_, funcdef_no=58751, decl_uid=1359292, cgraph_uid=49356, symbol_order=50545)
;; Function nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (_ZNK7nsINode14IsNodeInternalIP12nsStaticAtomJS2_S2_S2_S2_S2_S2_S2_S2_EEEbT_DpT0_, funcdef_no=58719, decl_uid=1359003, cgraph_uid=49324, symbol_order=50513)

GCC ends up producing empty BBS with 1753608 debug statements in
nsINode::IsNodeInternal<nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*, nsStaticAtom*> (const struct nsINode * const this, struct nsStaticAtom * aFirst, struct nsStaticAtom * aArgs#0, struct nsStaticAtom * aArgs#1, struct nsStaticAtom * aArgs#2, struct nsStaticAtom * aArgs#3, struct nsStaticAtom * aArgs#4, struct nsStaticAtom * aArgs#5, struct nsStaticAtom * aArgs#6, struct nsStaticAtom * aArgs#7

I did not wait for the version with 9 parameters :)
Comment 5 Richard Biener 2019-04-29 06:51:45 UTC
It sounds like possibly "caused" by the PR90131 or PR89892 fixes.  We've been
here before btw (with the complaint of "useless" redundant debug-bind stmts).

I see no "testcase" here?
Comment 6 Richard Biener 2019-04-29 07:11:46 UTC
Created attachment 46256 [details]
debug stmt DCE

Does the attached help?  On the testcase from PR89892 it does

@@ -204,7 +202,6 @@
   goto <bb 5>; [100.00%]
 
   <bb 4> [local count: 21570272]:
-  # DEBUG d => NULL
   # DEBUG d => 0
   # DEBUG BEGIN_STMT
   # DEBUG BEGIN_STMT
@@ -213,8 +210,6 @@
   a ={v} _3;
   # DEBUG BEGIN_STMT
   # DEBUG d => 1
-  # DEBUG d => NULL
-  # DEBUG d => 1
   # DEBUG BEGIN_STMT
   # DEBUG BEGIN_STMT
   _5 = c.2_6 + 1;

and later

@@ -268,7 +264,6 @@
   # DEBUG BEGIN_STMT
   d_26 = d_32 + 1;
   # DEBUG d => d_26
-  # DEBUG d => d_26
   # DEBUG BEGIN_STMT
   if (d_26 <= 5)
     goto <bb 7>; [89.00%]

for example.  I need to ask Alex whether we can ignore location differences
(and what differences) on the debug stmts.
Comment 7 Richard Biener 2019-04-29 07:24:23 UTC
Alex - in a series of

# DEBUG x => ...
# DEBUG x => ...
# DEBUG x => ...

what are the rules of which ones we can remove?  Can we always just keep the
last?  What about location differences?  What about possibly interleaving
DEBUG_BEGIN stmts?

For the quoted example

# DEBUG aFirst => NULL
# DEBUG aArgs#0 => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG aArgs#0 => NULL
# DEBUG this => NULL
# DEBUG aFirst => NULL
# DEBUG aArgs#0 => NULL
# DEBUG aArgs#1 => NULL
# DEBUG this => NULL

I fear the simple way of matching only adjacent stmts isn't catching all
possibilities.  Using a bitmap might be possible but can be also expensive.
Comment 8 Jakub Jelinek 2019-04-29 07:29:51 UTC
I guess it is important to know if those are the same decls or different, just with the same names.  -fdump-tree-all-uid will tell...
Comment 9 Richard Biener 2019-04-29 07:30:28 UTC
Created attachment 46257 [details]
more aggressive variant

More aggressive variant using a bitmap, simply keeping the last, does for example

@@ -243,7 +237,6 @@
   <bb 5> [local count: 43794188]:
   # DEBUG BEGIN_STMT
   i_28 = i_35 + 1;
-  # DEBUG i => i_28
   # DEBUG d => 1
   # DEBUG i => i_28

note if there are debug temps those are still not DCEd even if not used.
In theory they should always have debug temp defs and uses in the same
BB so the backward walk could gather local uses.  But that needs more
experiments / verification.
Comment 10 Jan Hubicka 2019-04-29 08:34:29 UTC
Hi,
the file was too large for bugzilla
so I uploaded it to
http://www.ucw.cz/~hubicka/Unified_cpp_dom_events0-8.ii.xz
and posted link in comment #2 :)

The agressive variant helps (I did not try to other one) but we still
get huge BBs containing:

  # DEBUG thisD.1400042 => NULL
  # DEBUG thisD.1400332 => NULL
  # DEBUG aFirstD.1400331 => NULL
  # DEBUG thisD.1400330 => NULL
  # DEBUG aFirstD.1400329 => NULL
  # DEBUG aArgs#0D.1400328 => NULL
  # DEBUG thisD.1400327 => NULL
  # DEBUG aFirstD.1400326 => NULL
  # DEBUG aArgs#0D.1400325 => NULL
  # DEBUG aArgs#1D.1400324 => NULL
  # DEBUG thisD.1400323 => NULL
  # DEBUG aFirstD.1400322 => NULL
  # DEBUG aArgs#0D.1400321 => NULL
  # DEBUG aArgs#1D.1400320 => NULL
  # DEBUG aArgs#2D.1400319 => NULL
  # DEBUG thisD.1400318 => NULL
  # DEBUG aFirstD.1400317 => NULL
  # DEBUG aArgs#0D.1400316 => NULL
  # DEBUG aArgs#1D.1400315 => NULL
  # DEBUG aArgs#2D.1400314 => NULL
  # DEBUG aArgs#3D.1400313 => NULL
  # DEBUG thisD.1400312 => NULL
  # DEBUG aFirstD.1400311 => NULL
  # DEBUG aArgs#0D.1400310 => NULL
  # DEBUG aArgs#1D.1400309 => NULL
  # DEBUG aArgs#2D.1400308 => NULL
  # DEBUG aArgs#3D.1400307 => NULL
  # DEBUG aArgs#4D.1400306 => NULL
  # DEBUG thisD.1400305 => NULL
  # DEBUG aFirstD.1400304 => NULL
  # DEBUG aArgs#0D.1400303 => NULL
  # DEBUG aArgs#1D.1400302 => NULL
  # DEBUG aArgs#2D.1400301 => NULL
  # DEBUG aArgs#3D.1400300 => NULL
  # DEBUG aArgs#4D.1400299 => NULL
  # DEBUG aArgs#5D.1400298 => NULL
  # DEBUG thisD.1400297 => NULL
  # DEBUG aFirstD.1400296 => NULL
  # DEBUG aArgs#0D.1400295 => NULL
  # DEBUG aArgs#1D.1400294 => NULL
  ....

I will be back after breakfast :)
Especialy for LTO we realy want to avoid too many of those in early
opts.

Honza
Comment 11 Richard Biener 2019-04-29 10:20:32 UTC
I also see

  # DEBUG D#1 => {CLOBBER}
  # DEBUG grayWordD.71258 => D#1

that looks pointless.  Results from removing

grayWord_57(D) ={v} {CLOBBER};

during into-SSA rewrite.  That should instead be

 # DEBUG grayWord => NULL

saving one debug-temp.  Testing

Index: gcc/tree-ssa.c
===================================================================
--- gcc/tree-ssa.c      (revision 270643)
+++ gcc/tree-ssa.c      (working copy)
@@ -358,6 +358,11 @@ insert_debug_temp_for_var_def (gimple_st
       else if (value == error_mark_node)
        value = NULL;
     }
+  else if (gimple_clobber_p (def_stmt))
+    /* We can end up here when rewriting a decl into SSA and coming
+       along a clobber for the original decl.  Turn that into
+       # DEBUG decl => NULL  */
+    value = NULL;
   else if (is_gimple_assign (def_stmt))
     {
       bool no_value = false;
Comment 12 Richard Biener 2019-04-29 10:29:35 UTC
Is

  # DEBUG INLINE_ENTRY NULL

useful at all?  And since I see

  # DEBUG D#1157 => &aOther_2(D)->mOffsetD.377222
  # DEBUG thisD.1370683 => D#1158
  # DEBUG aOtherD.1370684 => D#1157
  # DEBUG INLINE_ENTRY __ct D.183324
  MEM[(struct MaybeD.183104 *)this_5(D) + 16B].mIsSomeD.183308 = 0;
  _7 = MEM[(const struct Maybe &)aOther_2(D) + 16].mIsSomeD.183308;
  if (_7 != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  # DEBUG thisD.1370685 => D#1157
  # DEBUG INLINE_ENTRY operator*D.183425
  # DEBUG thisD.1370677 => D#1157
  # DEBUG INLINE_ENTRY NULL
  # DEBUG thisD.1370678 => D#1157
  # DEBUG INLINE_ENTRY NULL

my guess that debug temps are always in the same BB as uses is wrong.
Comment 13 Richard Biener 2019-04-29 10:47:48 UTC
(In reply to Richard Biener from comment #9)
> Created attachment 46257 [details]
> more aggressive variant
> 
> More aggressive variant using a bitmap, simply keeping the last, does for
> example
> 
> @@ -243,7 +237,6 @@
>    <bb 5> [local count: 43794188]:
>    # DEBUG BEGIN_STMT
>    i_28 = i_35 + 1;
> -  # DEBUG i => i_28
>    # DEBUG d => 1
>    # DEBUG i => i_28
> 
> note if there are debug temps those are still not DCEd even if not used.
> In theory they should always have debug temp defs and uses in the same
> BB so the backward walk could gather local uses.  But that needs more
> experiments / verification.

As noted this one likely isn't safe considering

 # DEBUG D#1 => 123;
 # DEBUG a => D#1;
 # DEBUG D#1 => 234;
 # DEBUG b => D#1;

where we'd remove the # DEBUG D#1 => 123; stmt.  It has been said the
situation can only occur with DEBUG_EXPR_DECL thus excempting those
would work.  That is, replacing D#1 with say 'c' won't ever happen
(non-DEBUG_EXPR_DECL decls on the RHS, there just appear SSA names,
invariants or DEBUG_EXPR_DECLs).

With not DCEing DEBUG_EXPR_DECL binds on the testcase we go from

> wc -l Unified_cpp_dom_events0-8.ii.046t.release_ssa
163546 Unified_cpp_dom_events0-8.ii.046t.release_ssa

to

> wc -l Unified_cpp_dom_events0-8.ii.046t.release_ssa
163626 Unified_cpp_dom_events0-8.ii.046t.release_ssa
Comment 14 Richard Biener 2019-04-29 10:50:09 UTC
Created attachment 46258 [details]
updated patch

Updated patch.  Previous aggressive variant survived bootstrap/test OK.
Comment 15 Richard Biener 2019-04-29 11:13:01 UTC
And it's indeed "caused" by the CFG-cleanup fixes.  The DCE patch gets us back
to (nearly) the same number of debug stmts as before.
Comment 16 Richard Biener 2019-04-29 11:30:49 UTC
Created attachment 46260 [details]
removed-unused-local patch

There do appear to be variables that are just appearing in # DEBUG var => NULL
stmts, not in debug binds and not referenced elsewhere.  But not many.
Mostly unused this variables it seems.  362 debug binds are dead this way
for the testcase (after the DCE patch).
Comment 17 rsandifo@gcc.gnu.org 2019-04-29 11:37:40 UTC
Created attachment 46261 [details]
Doing the removal in find_obviously_necessary_stmts

Just for the record: I'd written this over the weekend for completely unrelated reasons (was looking at debug info, and wanted to replicate the DCE that is already done in RTL).  It doesn't include the DEBUG_EXPR_DECL stuff, so obviously can't go in as-is, but it means that we never even add the redundant instructions to the worklist.
Comment 18 Richard Biener 2019-04-29 12:21:49 UTC
(In reply to rsandifo@gcc.gnu.org from comment #17)
> Created attachment 46261 [details]
> Doing the removal in find_obviously_necessary_stmts
> 
> Just for the record: I'd written this over the weekend for completely
> unrelated reasons (was looking at debug info, and wanted to replicate the
> DCE that is already done in RTL).  It doesn't include the DEBUG_EXPR_DECL
> stuff, so obviously can't go in as-is, but it means that we never even add
> the redundant instructions to the worklist.

Yeah, that looks nearly equivalent (same issue with DEBUG_EXPR_DECLs though).

I think doing it in eliminate_unnecessary_stmts is slightly better for the
case where we come from 

  # DEBUG i => NULL;
  i_3 = 1;
  # DEBUG i => i_3;

and are about to DCE i_3 (I think that's not handled by my patch either
without further tweaks).  No debug-stmt support in the GIMPLE FE right now
so not so easy to test atm.  For 

int main()
{
  int i;
  i = 1, i = 2;
  return i+1;
}

with -O -g -fdisable-tree-ccp1 -fdisable-tree-fre1 -fdisable-tree-evrp we
go from

  <bb 2> :
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  i_1 = 1;
  # DEBUG i => i_1
  i_2 = 2;
  # DEBUG i => i_2
  # DEBUG BEGIN_STMT
  _3 = 3;
  return 3;

after DSE1 to

  <bb 2> :
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG i => 1
  # DEBUG i => 2
  # DEBUG BEGIN_STMT
  return 3;

after CDDCE1 which could be improved to just # DEBUG i => 2 (note the
missing DEBUG BEGIN_STMT for compound stmts).  The following would do
that trick (doing a bit more debug DCE on the testcase in this PR)

Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c  (revision 270645)
+++ gcc/tree-ssa-dce.c  (working copy)
@@ -1282,11 +1283,15 @@ eliminate_unnecessary_stmts (void)
              if (!is_gimple_debug (stmt))
                something_changed = true;
              remove_dead_stmt (&gsi, bb, to_remove_edges);
+             continue;
            }
          else if (is_gimple_call (stmt))
            {
Comment 19 rsandifo@gcc.gnu.org 2019-04-29 13:18:18 UTC
(In reply to Richard Biener from comment #18)
> (In reply to rsandifo@gcc.gnu.org from comment #17)
> > Created attachment 46261 [details]
> > Doing the removal in find_obviously_necessary_stmts
> > 
> > Just for the record: I'd written this over the weekend for completely
> > unrelated reasons (was looking at debug info, and wanted to replicate the
> > DCE that is already done in RTL).  It doesn't include the DEBUG_EXPR_DECL
> > stuff, so obviously can't go in as-is, but it means that we never even add
> > the redundant instructions to the worklist.
> 
> Yeah, that looks nearly equivalent (same issue with DEBUG_EXPR_DECLs though).
> 
> I think doing it in eliminate_unnecessary_stmts is slightly better for the
> case where we come from 
> 
>   # DEBUG i => NULL;
>   i_3 = 1;
>   # DEBUG i => i_3;
> 
> and are about to DCE i_3

Ah, yeah.
Comment 20 Jakub Jelinek 2019-04-29 16:03:18 UTC
Another report: https://gcc.gnu.org/ml/gcc/2019-04/msg00270.html
Comment 21 Jan Hubicka 2019-04-29 16:08:11 UTC
I just built Firefox with GCC 9 branch and the updated patch. Debug enabled build is:
real    45m56.187s
user    493m8.688s
sys     22m4.512s

compared to debug disabled build with GCC 9:
real    35m5.141s
user    386m2.364s
sys     17m21.892s

GCC 8 debug disabled build is:
real    36m55.979s
user    389m54.256s
sys     17m51.964s

and llvm7 debug disabled is:
real    36m28.096s
user    384m24.056s
sys     8m53.768s

All with LTO. I will build gcc8 and llvm7 with debug info enabled.
Comment 22 Jan Hubicka 2019-04-29 22:17:30 UTC
gcc 8 with debug info

real    45m49.664s
user    500m1.776s
sys     22m39.816s

llvm 7 with debug info

real    43m43.798s
user    447m36.028s
sys     10m13.512s
Comment 23 Alexandre Oliva 2019-04-30 00:44:07 UTC
> what are the rules of which ones we can remove?  Can we always just keep the
last?  What about location differences?  What about possibly interleaving
DEBUG_BEGIN stmts?

code insns and markers are potential inspection points, so binds reaching them should ideally not be messed with.  this means one can safely remove binds that are overridden before the next inspection point, so yes, keeping only the last one that provides a binding for any given variable before each inspection point should be safe.

this should (in theory) be reasonably rare, though, because there should be sequence points between assignments to the same variable.  this doesn't imply inspection points, e.g., we don't insert markers at comma operators.  I suppose these arise from binds inserted by CFG rearrangements rather than when entering SSA.  I don't have much of a sense of these yet, but it's clear their meaning is looser, in a way.  They don't refer to a specific code location or bind present in sources, but rather they indicate our inability to keep track of bindings after certain transformations.  My intuition is that, as things are, we could drop them in the same circumstances we'd drop overriding binds, but maybe we could go looser about them, especially when they're resets.

When you ask about location differences, it's not clear whether you're asking about binding values (used in location lists) or source line locations.  It's not always viable to tell whether binding expressions are equivalent, but the last one prevails except at control flow confluences, and var-tracking deals with that much later.  As for source line locations, I don't think they're relevant in debug bind statements.


> # DEBUG INLINE_ENTRY NULL

That is a marker, but without naming the inlined function, it would appear to be useless indeed.  I wonder, however, if the logical block that named the inlined function was really lost, or if it just references some anonymous function.  This would guide the decision on whether the marker became useless for losing track of the function named by it (which might require an investigation of how this came about, instead of silently dropping further debug info), or whether we just don't have a name for the function any more (but we might still have debug info generated for it)

> my guess that debug temps are always in the same BB as uses is wrong.

they can be bound in one BB and used in another, indeed.  they may even be bound in multiple BBs and used in multiple other BBs, and then var-tracking will attempt to identify shared values in incoming exprs at confluence points.  we don't do that very much, but we could.  even without that, there are cases in which we issue temp binds in one block and reference them in another, even when we're not sure one reaches the other: var-tracking sorts that out eventually.
Comment 24 Jakub Jelinek 2019-04-30 07:34:13 UTC
(In reply to Alexandre Oliva from comment #23)
> > what are the rules of which ones we can remove?  Can we always just keep the
> last?  What about location differences?  What about possibly interleaving
> DEBUG_BEGIN stmts?

I think Richard's patch only considers debug bind stmts next to each other, any other stmt like DEBUG BEGIN_STMT, etc. reset the table (i.e. stop the elimination).

> When you ask about location differences, it's not clear whether you're
> asking about binding values (used in location lists) or source line
> locations.  It's not always viable to tell whether binding expressions are
> equivalent, but the last one prevails except at control flow confluences,
> and var-tracking deals with that much later.  As for source line locations,
> I don't think they're relevant in debug bind statements.

My understanding is that we completely ignore gimple_location from debug bind stmts.  While debug stmts and later DEBUG_INSNs do have location, the -fcompare-debug rules should make sure that other stmts/insns location is not affected by the locations of the debug stmts/insns.  And, var-tracking replaces the DEBUG_INSNs with NOTE_INSN_VAR_LOCATION notes that do not have a locus.

> > my guess that debug temps are always in the same BB as uses is wrong.
> 
> they can be bound in one BB and used in another, indeed.  they may even be
> bound in multiple BBs and used in multiple other BBs, and then var-tracking
> will attempt to identify shared values in incoming exprs at confluence
> points.  we don't do that very much, but we could.  even without that, there
> are cases in which we issue temp binds in one block and reference them in
> another, even when we're not sure one reaches the other: var-tracking sorts
> that out eventually.

Besides Richard's eliminate_unnecessary_stmts patch, which you've acked and I'm going to install soon, there are other patches (GCC 10 only presumably): #c11 and #c16, what do you think about those? The former canonicalizes {CLOBBER} values of debug stmts to NULL, the latter tries to remove debug stmts for variables which are not referenced in the function at all and have only NULL debug binds for them; I believe in such cases the debug insns don't provide anything over the optimized away vars in the body.
Another thing is if we could also remove DEBUG_EXPR_DECLs by doing analysis similar to #c16, look through all debug bind stmts, see what DEBUG_EXPR_DECLs they are refering to, and if there are DEBUG_EXPR_DECLs which are not referenced in any of them and they aren't referenced in decl_debug_args_lookup either, just 
remove debug bind stmts for those DEBUG_EXPR_DECLs.  Similarly, would it be useful to reset debug bind stmts if they refer to DEBUG_EXPR_DECLs that are never set (or do it only for some simple cases where we couldn't figure out any location from the expr)?
Comment 25 Jakub Jelinek 2019-04-30 07:40:37 UTC
Author: jakub
Date: Tue Apr 30 07:40:06 2019
New Revision: 270674

URL: https://gcc.gnu.org/viewcvs?rev=270674&root=gcc&view=rev
Log:
	PR tree-optimization/90273
	* tree-ssa-dce.c (eliminate_unnecessary_stmts): Eliminate
	useless debug stmts.

Modified:
    branches/gcc-9-branch/gcc/ChangeLog
    branches/gcc-9-branch/gcc/tree-ssa-dce.c
Comment 26 Alexandre Oliva 2019-04-30 14:27:25 UTC
I saw the #c11 patch in gcc-patches, and it seemed to have been posted FTR and installed.  It looked good, so I didn't comment on it.

I agree about the effects of #c16, though I begin to get a feeling that it's working too hard for too little benefit.  Ditto trying to optimize debug temps: you will get some savings, sure, but how much benefit for such global analyses?

Perhaps we'd get a much bigger bang for the buck introducing vector resets, in which a single gimple bind stmt would reset several decls at once.  If that's become as common as it is being made out to be, this could save a significant amount of memory.

Though from Jan's comments on compile times, it doesn't look like we've got much slower, which makes me wonder what the new problem really is...  Could it be that debug binds have always been there, plentiful but under the radar, and that the real recent regression (assuming there really is one) lies elsewhere?  (sorry, I haven't really dug into it myself)
Comment 27 rguenther@suse.de 2019-04-30 15:21:32 UTC
On April 30, 2019 4:27:25 PM GMT+02:00, "aoliva at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90273
>
>--- Comment #26 from Alexandre Oliva <aoliva at gcc dot gnu.org> ---
>I saw the #c11 patch in gcc-patches, and it seemed to have been posted
>FTR and
>installed.  It looked good, so I didn't comment on it.
>
>I agree about the effects of #c16, though I begin to get a feeling that
>it's
>working too hard for too little benefit.  Ditto trying to optimize
>debug temps:
>you will get some savings, sure, but how much benefit for such global
>analyses?
>
>Perhaps we'd get a much bigger bang for the buck introducing vector
>resets, in
>which a single gimple bind stmt would reset several decls at once.  If
>that's
>become as common as it is being made out to be, this could save a
>significant
>amount of memory.
>
>Though from Jan's comments on compile times, it doesn't look like we've
>got
>much slower, which makes me wonder what the new problem really is... 
>Could it
>be that debug binds have always been there, plentiful but under the
>radar, and
>that the real recent regression (assuming there really is one) lies
>elsewhere? 
>(sorry, I haven't really dug into it myself)

The recent regression is we no longer throw them away plentiful during CFG cleanup and now they pile up during inlining. 

I agree full DCE with liveness will be expensive for usually little gain. Not sure if vector resets will improve things much.
Comment 28 Jan Hubicka 2019-04-30 15:56:54 UTC
> The recent regression is we no longer throw them away plentiful during CFG
> cleanup and now they pile up during inlining. 
> 
> I agree full DCE with liveness will be expensive for usually little gain. Not
> sure if vector resets will improve things much.

One thing to keep in mind that after early opts and in late opts after
the initial cleanups post ipa-inline-transforms we likely have a lot of
new debug statements brought in by inliner.  It would make sense to do
something more expensive twice in queue to get rid of them. Especially
in early opts we do not want to make too many useless debug statements
to hit the LTO stream or get duplicated by subsequent inlining.

Honza
Comment 29 Jakub Jelinek 2019-05-01 07:21:57 UTC
Fixed on gcc-9-branch, trunk commits left for Richard.
Comment 30 Richard Biener 2019-05-02 11:17:21 UTC
Fixed.
Comment 31 Richard Biener 2019-05-02 11:17:34 UTC
Author: rguenth
Date: Thu May  2 11:17:00 2019
New Revision: 270791

URL: https://gcc.gnu.org/viewcvs?rev=270791&root=gcc&view=rev
Log:
2019-05-02  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/90273
	* tree-ssa-dce.c (eliminate_unnecessary_stmts): Eliminate
	useless debug stmts.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-dce.c