Created attachment 50978 [details] testcase Noticed in a yarpget test-case and WRF for instance. -O3 runs really slowly
I'm reducing the test-case now..
Unfortunately, the reduction is stuck at 200KB. Please let me know if you can analyze the original test-case?
Yeah thats fine, I'll look at the original.
When a range is being calculated for an ssa-name, the propagation process often goes along back edges. These back edges sometime require other ssa-names which have not be processed yet. These are flagged as "poor values" and when propagation is done, we visit the list of poor values, calculate them, and see if that may result if a better range for the original ssa-name. The problem is that calculating these poor values may also spawn another set of requests since the block at the far end of the back edge has not been processed yet... its highly likely that some additional unprocessed ssa-names are used in the calculation of that name, but typically they do not affect the current range in a significant way. Thus we mostly we care about the first order effect only. It turns out to be very rare that a 2nd order effect on a back edge affects anything that we don't catch later. This patch turns off poor-value tagging when looking up the first order values, thus avoiding the 2nd order and beyond cascading effects. I haven't found a test case we miss yet because of this change, yet it probably resolves a number of the outstanding compilation problems in a significant way. I think this will probably apply to gcc 11 in some form as well, so I'll look at an equivalent patch for there.
Should be fixed with: commit ecc5644fa3bc7f37eada2a3e9c627cd1918922e0 Author: Andrew MacLeod <amacleod@redhat.com> Date: Mon Jun 14 15:33:59 2021 -0400 Limit new value calculations to first order effects. When utilzing poor values during propagation, we mostly care about values that were undefined/processed directly used in calcualting the SSA_NAME being processed. 2nd level derivations of such poor values rarely affect the inital calculation. Leave them to when they are directly encountered. * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust. (ranger_cache::enable_new_values): Set to specified value and return the old value. (ranger_cache::disable_new_values): Delete. (ranger_cache::fill_block_cache): Disable non 1st order derived poor values. * gimple-range-cache.h (ranger_cache): Adjust prototypes. * gimple-range.cc (gimple_ranger::range_of_expr): Adjust.
I swear I put that text in and moved this to resolved... :-( sigh. sorry. Anyway, this does not appear to be an issue in GCC 11.. the effect appears to have been magnified by the new aggressive import/export calculation code in the GORI rework.
I'm sorry, but the compile-time hog is still not resolved. I can still see it in cam4 SPEC benchmark and I'm attaching one another yarpgen test-case.
Created attachment 51027 [details] Another test-case
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>: https://gcc.gnu.org/g:870b674f72d4894b94efa61764fd87ecec29ffde commit r12-1652-g870b674f72d4894b94efa61764fd87ecec29ffde Author: Andrew MacLeod <amacleod@redhat.com> Date: Fri Jun 18 12:33:18 2021 -0400 Remove poor value computations. Remove the old "poor value" approach which made callbacks into ranger from the cache. Use only the best available value for all propagation. PR tree-optimization/101014 * gimple-range-cache.cc (ranger_cache::ranger_cache): Remove poor value list. (ranger_cache::~ranger_cache): Ditto. (ranger_cache::enable_new_values): Delete. (ranger_cache::push_poor_value): Delete. (ranger_cache::range_of_def): Remove poor value processing. (ranger_cache::entry_range): Ditto. (ranger_cache::fill_block_cache): Ditto. * gimple-range-cache.h (class ranger_cache): Remove poor value members. * gimple-range.cc (gimple_ranger::range_of_expr): Remove call. * gimple-range.h (class gimple_ranger): Adjust.
Really fixed this time.
I'm not sure if it's related but compilation of 527.cam4_r still hangs with gcc version 12.0.0 20210621 (experimental) (GCC) and option: -march=cascadelake -Ofast -funroll-loops -flto -g -mfpmath=sse hang on this thread for more than 2h. liuhongt 79919 79918 99 15:52 pts/4 00:50:13 /export/users2/liuhongt/install/gnu-toolchain_master/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto1 -march=cascadelake -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mavx512f -mbmi -mbmi2 -maes -mpclmul -mavx512vl -mavx512bw -mavx512dq -mavx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mavx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mclwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mpku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -quiet -dumpbase ./cam4_r.ltrans43.ltrans -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mavx512f -mbmi -mbmi2 -maes -mpclmul -mavx512vl -mavx512bw -mavx512dq -mavx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mavx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mclwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mpku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mtune=cascadelake -mfpmath=sse -m64 -mfpmath=sse -g -g -Ofast -Ofast -fno-openmp -fno-openacc -fno-pie -fcf-protection=none -funroll-loops -fno-associative-math -fltrans @/tmp/cctsotsZ -o /tmp/ccjbpUzj.s
(In reply to Hongtao.liu from comment #11) > I'm not sure if it's related but compilation of 527.cam4_r still hangs with > > gcc version 12.0.0 20210621 (experimental) (GCC) Can you verify after which patch upstream it started hanging? It may or may not be related to this bug. Or perhaps, can you check where it hangs? Is it hanging in the ranger code or elsewhere? Thanks.
(In reply to Aldy Hernandez from comment #12) > (In reply to Hongtao.liu from comment #11) > > I'm not sure if it's related but compilation of 527.cam4_r still hangs with > > > > gcc version 12.0.0 20210621 (experimental) (GCC) > > Can you verify after which patch upstream it started hanging? It may or may > not be related to this bug. > > Or perhaps, can you check where it hangs? Is it hanging in the ranger code > or elsewhere? After hanging for 36m, with gdb -p pid (gdb) bt #0 0x0000000001035810 in irange::varying_compatible_p (this=this@entry=0x7ffdd7672630) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-range.h:289 #1 0x000000000102a08b in irange::normalize_kind (this=0x7ffdd7672630) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-range.h:584 #2 irange::irange_set (this=0x7ffdd7672630, min=<optimized out>, max=<optimized out>) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-range.cc:182 #3 0x000000000102922c in range_query::get_tree_range (this=0x2614590 <global_ranges>, r=..., expr=0x148092cd3de0, stmt=0x148092896738) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/value-query.cc:212 #4 0x000000000175457e in fold_using_range::range_of_range_op (this=<optimized out>, r=..., s=0x148092896738, src=...) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:642 #5 0x0000000001757606 in fold_using_range::fold_stmt (this=0x7ffdd76736cf, r=..., s=0x148092896738, src=..., name=0x1480925eae10) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:577 #6 0x000000000175795d in fold_range (r=..., s=s@entry=0x148092896738, q=<optimized out>) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:312 #7 0x000000000175a5d3 in ranger_cache::range_of_def (this=0x7ffdd7687950, r=..., name=0x1480925eae10, bb=0x0) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:842 #8 0x000000000175a690 in ranger_cache::entry_range (this=0x7ffdd7687950, r=..., name=0x1480925eae10, bb=0x148092bffbc8) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:866 #9 0x000000000175a796 in ranger_cache::range_of_expr (this=<optimized out>, r=..., name=<optimized out>, stmt=<optimized out>) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:914 #10 0x000000000175faaa in gori_compute::compute_operand1_range (this=0x7ffdd76879d0, r=..., stmt=0x14809245bb40, lhs=..., name=0x1480932cf9d8, src=...) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-gori.cc:877 #11 0x000000000176083a in gori_compute::compute_operand_range (src=..., name=0x1480932cf9d8, lhs=..., stmt=0x14809245bb40, r=..., this=0x7ffdd76879d0) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-gori.cc:620 #12 gori_compute::outgoing_edge_range_p (this=this@entry=0x7ffdd76879d0, r=..., e=e@entry=0x14809234a750, name=name@entry=0x1480932cf9d8, q=...) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-gori.cc:1044 #13 0x000000000175ae00 in ranger_cache::propagate_cache (this=0x7ffdd7687950, name=0x1480932cf9d8) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:1027 #14 0x000000000175b4e7 in ranger_cache::fill_block_cache (this=0x7ffdd7687950, name=0x1480932cf9d8, bb=<optimized out>, def_bb=0x1480933e5ea0) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:1238 #15 0x000000000175b980 in ranger_cache::block_range (this=0x7ffdd7687950, r=..., bb=0x148092c4e680, name=0x1480932cf9d8, calc=<optimized out>) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range-cache.cc:971 #16 0x0000000001753a92 in gimple_ranger::range_on_entry (this=0x7ffdd7687940, r=..., bb=0x148092c4e680, name=0x1480932cf9d8) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:1203 #17 0x0000000001757cef in gimple_ranger::range_of_expr (this=<optimized out>, r=..., expr=0x1480932cf9d8, stmt=<optimized out>) at /export/users2/liuhongt/gcc/gnu-toolchain/master/gcc/gimple-range.cc:1186 > > Thanks.
I can confirm this and I've opened PR101148 for this.
Created attachment 51043 [details] One another test-case I have one more test-case that hangs with -O3.
Reopenning.
Created attachment 51050 [details] patch to fix the issue The gift that keeps on giving eh. OK, this should solve the infinite loop. Give it a try, I'm running it through testing now. When I introduced the sparse on-entry cache, it is limited to 15 unique ranges for any given ssa-name, then it reverts to varying for any additional values to be safe. The cache propagation engine works by combining incoming ranges and if that is different than that current on-entry range, stores that and proceeds to push this new value on outgoing edges. What was happening here is this new value that was calculated was beyond the 15 allowed. When it was stored, it was stored as VARYING. This block was in a cycle feeding back to itself, so when it calculated the on-enty value again and compared, it though it needed to update again. Which of course failed again... and the endless loop of trying to propagate was born. This patch checks that the value being stored to the cache is the same as the value when it is immediately reloaded. If that fails, we stop trying to propagate that value. Please check that it solves both this problam, and likely the 101148 problem
Thank you for the patch. I can confirm it fixes both the attached yarpgen test-case and cam4 finishes (PR101148).
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>: https://gcc.gnu.org/g:a03e944e92ee51ae583382079d4739b64bd93b35 commit r12-1750-ga03e944e92ee51ae583382079d4739b64bd93b35 Author: Andrew MacLeod <amacleod@redhat.com> Date: Tue Jun 22 17:46:05 2021 -0400 Do not continue propagating values which cannot be set properly. If the on-entry cache cannot properly represent a range, do not continue trying to propagate it. PR tree-optimization/101148 PR tree-optimization/101014 * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust. (ranger_cache::~ranger_cache): Adjust. (ranger_cache::block_range): Check if propagation disallowed. (ranger_cache::propagate_cache): Disallow propagation if new value can't be stored properly. * gimple-range-cache.h (ranger_cache::m_propfail): New member.
Hopefully this closes it for good. The final patch needed to adjust the propagation engine to avoid propagating the failed value more than once. The original patch simply stopped propagating immediately, and that caused other issues.
The releases/gcc-11 branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>: https://gcc.gnu.org/g:85c22c517e9571d1f0f487fd708fbb01f36f172a commit r11-8750-g85c22c517e9571d1f0f487fd708fbb01f36f172a Author: Andrew MacLeod <amacleod@redhat.com> Date: Tue Jun 22 17:46:05 2021 -0400 Do not continue propagating values which cannot be set properly. If the on-entry cache cannot properly represent a range, do not continue trying to propagate it. PR tree-optimization/101148 PR tree-optimization/101014 * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust. (ranger_cache::~ranger_cache): Adjust. (ranger_cache::block_range): Check if propagation disallowed. (ranger_cache::propagate_cache): Disallow propagation if new value can't be stored properly. * gimple-range-cache.h (ranger_cache::m_propfail): New member.