This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/59811] [4.9/5/6 Regression] Huge increase in memory usage and compile time in combine

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 11 Feb 2016 10:42:56 +0000
Subject: [Bug rtl-optimization/59811] [4.9/5/6 Regression] Huge increase in memory usage and compile time in combine
Auto-submitted: auto-generated
References: <bug-59811-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59811

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
So SCCVN is limited with --param sccvn-max-alias-queries-per-access which has
a quite high default value (1000).  There isn't something more elaborate like
having an overall budget as well.

With lowering the default value to 100 I get to

 alias stmt walking      :  12.15 (16%) usr   0.06 (12%) sys  12.23 (16%) wall 
     2 kB ( 0%) ggc
...
 tree PRE                :   9.48 (13%) usr   0.01 ( 2%) sys   9.48 (13%) wall 
  1333 kB ( 0%) ggc
 tree FRE                :   9.91 (13%) usr   0.03 ( 6%) sys   9.92 (13%) wall 
   310 kB ( 0%) ggc
...
 combiner                :  31.87 (43%) usr   0.24 (46%) sys  32.10 (43%) wall 
528441 kB (80%) ggc
 rest of compilation     :  51.05 (69%) usr   0.31 (60%) sys  51.25 (69%) wall 
620347 kB (94%) ggc
 TOTAL                 :  73.85             0.52            74.34            
661274 kB

(see disclaimer about things not adding up).  As we run FRE twice and PRE
only once (but it uses a cheaper walking mode that ends up giving up earlier)
a reasonable result would be FRE at around 2 times the PRE time.

Note that we do run into the above limit all the time with this testcase
which has loads of stores and loads (non-aliased, of course).  And we do
seem to optimize stuff as upping the limit results in faster overall compile
time.

I think we need to more intelligently limit the walking and/or design some
caching of queries that works better or simply make the queries cheaper.
Most of the query cost comes from decomposing our reference trees and
repeatedly computing the overall access offset/size (get_ref_base_and_extent).
So if we could cache that we might win.

The number of loads/stores are not that bad in the testcase, we have
1700 stores after fre1 and 547 loads, 1142 stores after fre2 and 547 loads.
Before fre1 we had 1714 stores and 15334 loads.  So its clearly fre1 that
will run into this compile-time hog (it does max. #loads * #stores alias
queries).  OTOH we _do_ want to optimize those 14000 redundant loads!
It simply looks like the defs are far away (happens for example if you
have an initializer at the start of the function we can optimize from).

The 547 loads after fre1 is actually with upping the limit to 10000, with
the default limit we still have 5548 loads!  With lowering to 100 we get
11616 loads.

Note that compile-time would be worse if there were nothing to optimize here
(then you hit the worst-case of #loads * #stores alias queries).

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]