Bug 101014 - [12 Regression] Big compile time hog with -O3 since r12-1268-g9858cd1a6827ee7a
Summary: [12 Regression] Big compile time hog with -O3 since r12-1268-g9858cd1a6827ee7a
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: 12.0
Assignee: Andrew Macleod
URL:
Keywords: compile-time-hog, needs-reduction
Depends on:
Blocks: spec yarpgen
  Show dependency treegraph
 
Reported: 2021-06-10 10:58 UTC by Martin Liška
Modified: 2021-11-03 05:20 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 11.1.0
Known to fail: 12.0
Last reconfirmed: 2021-06-10 00:00:00


Attachments
testcase (62.16 KB, application/zstd)
2021-06-10 10:58 UTC, Martin Liška
Details
Another test-case (116.57 KB, application/zstd)
2021-06-16 08:38 UTC, Martin Liška
Details
One another test-case (62.44 KB, application/zstd)
2021-06-22 05:07 UTC, Martin Liška
Details
patch to fix the issue (956 bytes, patch)
2021-06-22 13:55 UTC, Andrew Macleod
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2021-06-10 10:58:46 UTC
Created attachment 50978 [details]
testcase

Noticed in a yarpget test-case and WRF for instance.
-O3 runs really slowly
Comment 1 Martin Liška 2021-06-10 11:13:46 UTC
I'm reducing the test-case now..
Comment 2 Martin Liška 2021-06-10 12:23:58 UTC
Unfortunately, the reduction is stuck at 200KB. Please let me know if you can analyze the original test-case?
Comment 3 Andrew Macleod 2021-06-10 13:42:58 UTC
Yeah thats fine, I'll look at the original.
Comment 4 Andrew Macleod 2021-06-14 20:28:31 UTC
When a range is being calculated for an ssa-name, the propagation process often goes along back edges. These back edges sometime require other ssa-names which have not be processed yet. These are flagged as "poor values" and when propagation is done, we visit the list of poor values, calculate them, and see if that may result if a better range for the original ssa-name. 

The problem is that calculating these poor values may also spawn another set of requests since the block at the far end of the back edge has not been processed yet... its highly likely that some additional unprocessed ssa-names are used in the calculation of that name, but typically they do not affect the current range in a significant way. 

Thus we mostly we care about the first order effect only.  It turns out to be very rare that a 2nd order effect on a back edge affects anything that we don't catch later. 

This patch turns off poor-value tagging when looking up the first order values, thus avoiding the 2nd order and beyond cascading effects.

I haven't found a test case we miss yet because of this change, yet it probably resolves a number of the outstanding compilation problems in a significant way.

I think this will probably apply to gcc 11 in some form as well, so I'll look at an equivalent patch for there.
Comment 5 Martin Liška 2021-06-15 06:49:19 UTC
Should be fixed with:

commit ecc5644fa3bc7f37eada2a3e9c627cd1918922e0
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Mon Jun 14 15:33:59 2021 -0400

    Limit new value calculations to first order effects.
    
    When utilzing poor values during propagation, we mostly care about values that
    were undefined/processed directly used in calcualting the SSA_NAME being
    processed.  2nd level derivations of such poor values rarely affect the
    inital calculation.  Leave them to when they are directly encountered.
    
            * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust.
            (ranger_cache::enable_new_values): Set to specified value and
            return the old value.
            (ranger_cache::disable_new_values): Delete.
            (ranger_cache::fill_block_cache): Disable non 1st order derived
            poor values.
            * gimple-range-cache.h (ranger_cache): Adjust prototypes.
            * gimple-range.cc (gimple_ranger::range_of_expr): Adjust.
Comment 6 Andrew Macleod 2021-06-15 13:01:57 UTC
I swear I put that text in and moved this to resolved... :-(  sigh. sorry.

Anyway, this does not appear to be an issue in GCC 11.. the effect appears to have been magnified by the new aggressive import/export calculation code in the GORI rework.
Comment 7 Martin Liška 2021-06-16 08:38:41 UTC
I'm sorry, but the compile-time hog is still not resolved. I can still see it in cam4 SPEC benchmark and I'm attaching one another yarpgen test-case.
Comment 8 Martin Liška 2021-06-16 08:38:59 UTC
Created attachment 51027 [details]
Another test-case
Comment 9 GCC Commits 2021-06-18 21:44:39 UTC
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:870b674f72d4894b94efa61764fd87ecec29ffde

commit r12-1652-g870b674f72d4894b94efa61764fd87ecec29ffde
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Fri Jun 18 12:33:18 2021 -0400

    Remove poor value computations.
    
    Remove the old "poor value" approach which made callbacks into ranger
    from the cache.  Use only the best available value for all propagation.
    
            PR tree-optimization/101014
            * gimple-range-cache.cc (ranger_cache::ranger_cache): Remove poor
            value list.
            (ranger_cache::~ranger_cache): Ditto.
            (ranger_cache::enable_new_values): Delete.
            (ranger_cache::push_poor_value): Delete.
            (ranger_cache::range_of_def): Remove poor value processing.
            (ranger_cache::entry_range): Ditto.
            (ranger_cache::fill_block_cache): Ditto.
            * gimple-range-cache.h (class ranger_cache): Remove poor value members.
            * gimple-range.cc (gimple_ranger::range_of_expr): Remove call.
            * gimple-range.h (class gimple_ranger): Adjust.
Comment 10 Andrew Macleod 2021-06-18 21:46:46 UTC
Really fixed this time.
Comment 11 Hongtao.liu 2021-06-21 08:48:38 UTC
I'm not sure if it's related but compilation of 527.cam4_r still hangs with

gcc version 12.0.0 20210621 (experimental) (GCC) 

and option:

-march=cascadelake -Ofast -funroll-loops -flto -g -mfpmath=sse

hang on this thread for more than 2h.

liuhongt  79919  79918 99 15:52 pts/4    00:50:13 /export/users2/liuhongt/install/gnu-toolchain_master/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto1 -march=cascadelake -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mavx512f -mbmi -mbmi2 -maes -mpclmul -mavx512vl -mavx512bw -mavx512dq -mavx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mavx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mclwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mpku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -quiet -dumpbase ./cam4_r.ltrans43.ltrans -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mavx512f -mbmi -mbmi2 -maes -mpclmul -mavx512vl -mavx512bw -mavx512dq -mavx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mavx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mclwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mpku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mtune=cascadelake -mfpmath=sse -m64 -mfpmath=sse -g -g -Ofast -Ofast -fno-openmp -fno-openacc -fno-pie -fcf-protection=none -funroll-loops -fno-associative-math -fltrans @/tmp/cctsotsZ -o /tmp/ccjbpUzj.s
Comment 12 Aldy Hernandez 2021-06-21 09:09:49 UTC
(In reply to Hongtao.liu from comment #11)
> I'm not sure if it's related but compilation of 527.cam4_r still hangs with
> 
> gcc version 12.0.0 20210621 (experimental) (GCC) 

Can you verify after which patch upstream it started hanging?  It may or may not be related to this bug.

Or perhaps, can you check where it hangs?  Is it hanging in the ranger code or elsewhere?

Thanks.
Comment 13 Hongtao.liu 2021-06-21 09:59:27 UTC
(In reply to Aldy Hernandez from comment #12)
> (In reply to Hongtao.liu from comment #11)
> > I'm not sure if it's related but compilation of 527.cam4_r still hangs with
> > 
> > gcc version 12.0.0 20210621 (experimental) (GCC) 
> 
> Can you verify after which patch upstream it started hanging?  It may or may
> not be related to this bug.
> 
> Or perhaps, can you check where it hangs?  Is it hanging in the ranger code
> or elsewhere?

After hanging for 36m, with gdb -p pid

(gdb) bt
#0  0x0000000001035810 in irange::varying_compatible_p (this=this@entry=0x7ffdd7672630)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-range.h:289
#1  0x000000000102a08b in irange::normalize_kind (this=0x7ffdd7672630)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-range.h:584
#2  irange::irange_set (this=0x7ffdd7672630, min=<optimized out>, max=<optimized out>)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-range.cc:182
#3  0x000000000102922c in range_query::get_tree_range (this=0x2614590 <global_ranges>, r=..., expr=0x148092cd3de0, stmt=0x148092896738)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-query.cc:212
#4  0x000000000175457e in fold_using_range::range_of_range_op (this=<optimized out>, r=..., s=0x148092896738, src=...)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:642
#5  0x0000000001757606 in fold_using_range::fold_stmt (this=0x7ffdd76736cf, r=..., s=0x148092896738, src=..., name=0x1480925eae10)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:577
#6  0x000000000175795d in fold_range (r=..., s=s@entry=0x148092896738, q=<optimized out>)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:312
#7  0x000000000175a5d3 in ranger_cache::range_of_def (this=0x7ffdd7687950, r=..., name=0x1480925eae10, bb=0x0)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:842
#8  0x000000000175a690 in ranger_cache::entry_range (this=0x7ffdd7687950, r=..., name=0x1480925eae10, bb=0x148092bffbc8)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:866
#9  0x000000000175a796 in ranger_cache::range_of_expr (this=<optimized out>, r=..., name=<optimized out>, stmt=<optimized out>)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:914
#10 0x000000000175faaa in gori_compute::compute_operand1_range (this=0x7ffdd76879d0, r=..., stmt=0x14809245bb40, lhs=..., 
    name=0x1480932cf9d8, src=...) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-gori.cc:877
#11 0x000000000176083a in gori_compute::compute_operand_range (src=..., name=0x1480932cf9d8, lhs=..., stmt=0x14809245bb40, r=..., 
    this=0x7ffdd76879d0) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-gori.cc:620
#12 gori_compute::outgoing_edge_range_p (this=this@entry=0x7ffdd76879d0, r=..., e=e@entry=0x14809234a750, name=name@entry=0x1480932cf9d8, 
    q=...) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-gori.cc:1044
#13 0x000000000175ae00 in ranger_cache::propagate_cache (this=0x7ffdd7687950, name=0x1480932cf9d8)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:1027
#14 0x000000000175b4e7 in ranger_cache::fill_block_cache (this=0x7ffdd7687950, name=0x1480932cf9d8, bb=<optimized out>, 
    def_bb=0x1480933e5ea0) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:1238
#15 0x000000000175b980 in ranger_cache::block_range (this=0x7ffdd7687950, r=..., bb=0x148092c4e680, name=0x1480932cf9d8, 
    calc=<optimized out>) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:971
#16 0x0000000001753a92 in gimple_ranger::range_on_entry (this=0x7ffdd7687940, r=..., bb=0x148092c4e680, name=0x1480932cf9d8)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:1203
#17 0x0000000001757cef in gimple_ranger::range_of_expr (this=<optimized out>, r=..., expr=0x1480932cf9d8, stmt=<optimized out>)
    at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:1186

> 
> Thanks.
Comment 14 Richard Biener 2021-06-21 10:37:16 UTC
I can confirm this and I've opened PR101148 for this.
Comment 15 Martin Liška 2021-06-22 05:07:54 UTC
Created attachment 51043 [details]
One another test-case

I have one more test-case that hangs with -O3.
Comment 16 Martin Liška 2021-06-22 05:08:11 UTC
Reopenning.
Comment 17 Andrew Macleod 2021-06-22 13:55:26 UTC
Created attachment 51050 [details]
patch to fix the issue

The gift that keeps on giving eh.
OK, this should solve the infinite loop. Give it a try, I'm running it through testing now.

When I introduced the sparse on-entry cache, it is limited to 15 unique ranges for any given ssa-name, then  it reverts to varying for any additional values to be safe.

The cache propagation engine works by combining incoming ranges and if that is different than that current on-entry range, stores that  and proceeds to push this new value on outgoing edges.

What was happening here is this new value that was calculated was beyond the 15 allowed. When it was stored, it was stored as VARYING.  This block was in a cycle feeding back to itself, so when it calculated the on-enty value again and compared, it though it needed to update again.  Which of course failed again... and the endless loop of trying to propagate was born.

This patch checks that the value being stored to the cache is the same as the value when it is immediately reloaded. If that fails, we stop trying to propagate that value.

Please check that it solves both this problam, and likely the 101148 problem
Comment 18 Martin Liška 2021-06-22 15:08:01 UTC
Thank you for the patch. I can confirm it fixes both the attached yarpgen test-case and cam4 finishes (PR101148).
Comment 19 GCC Commits 2021-06-23 14:26:53 UTC
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:a03e944e92ee51ae583382079d4739b64bd93b35

commit r12-1750-ga03e944e92ee51ae583382079d4739b64bd93b35
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Tue Jun 22 17:46:05 2021 -0400

    Do not continue propagating values which cannot be set properly.
    
    If the on-entry cache cannot properly represent a range, do not continue
    trying to propagate it.
    
            PR tree-optimization/101148
            PR tree-optimization/101014
            * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust.
            (ranger_cache::~ranger_cache): Adjust.
            (ranger_cache::block_range): Check if propagation disallowed.
            (ranger_cache::propagate_cache): Disallow propagation if new value
            can't be stored properly.
            * gimple-range-cache.h (ranger_cache::m_propfail): New member.
Comment 20 Andrew Macleod 2021-06-23 14:31:03 UTC
Hopefully this closes it for good.  The final patch needed to adjust the propagation engine to avoid propagating the failed value more than once.  The original patch simply stopped propagating immediately, and that caused other issues.
Comment 21 GCC Commits 2021-07-14 21:58:48 UTC
The releases/gcc-11 branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:85c22c517e9571d1f0f487fd708fbb01f36f172a

commit r11-8750-g85c22c517e9571d1f0f487fd708fbb01f36f172a
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Tue Jun 22 17:46:05 2021 -0400

    Do not continue propagating values which cannot be set properly.
    
    If the on-entry cache cannot properly represent a range, do not continue
    trying to propagate it.
    
            PR tree-optimization/101148
            PR tree-optimization/101014
            * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust.
            (ranger_cache::~ranger_cache): Adjust.
            (ranger_cache::block_range): Check if propagation disallowed.
            (ranger_cache::propagate_cache): Disallow propagation if new value
            can't be stored properly.
            * gimple-range-cache.h (ranger_cache::m_propfail): New member.