Created attachment 47351 [details] High stack usage due ftree-ch The code snippet (gcc_free_ch_stack.c) shows a high stack usage. With GCC 9.2.1 I see the resulting stack usage using -fstack-usage along with -O2: arm 632 aarch64 448 powerpc 912 powerpc64le 560 s390 600 s390x 632 i386 1376 x86_64 784 The same issue also shows in master branch. It seems that it is due -ftree-ch pass with feeds -ftree-loop-im, -ftree-pre, -fmove-loop-invariants, and -fgcse. Andrew Pinski suggested is mostly due lack of a good estimate register pressure for loop invariant code motion. Andrew also suggested to use -fno-tree-loop-im -fno-tree-pre -fno-gcse, however even with this options the resulting stack usage does not get in par with -Os option (which disables -ftree-ch). On powerpc64le: $ ./gcc/xgcc -v 2>&1 | grep 'gcc version' gcc version 10.0.0 20191121 (experimental) (GCC) $ ./gcc/xgcc -B gcc -O2 stack_usage.c -fstack-usage -c; cat stack_usage.su stack_usage.c:157:6:mlx5e_grp_sw_update_stats 496 static $ ./gcc/xgcc -B gcc -O2 stack_usage.c -fstack-usage -c -fno-tree-loop-im -fno-tree-pre -fno-move-loop-invariants -fno-gcse; cat stack_usage.su stack_usage.c:157:6:mlx5e_grp_sw_update_stats 176 static$ ./gcc/xgcc -B gcc -Os stack_usage.c -fstack-usage -c; cat stack_usage.su $ ./gcc/xgcc -B gcc -Os stack_usage.c -fstack-usage -c; cat stack_usage.su stack_usage.c:157:6:mlx5e_grp_sw_update_stats 32 static
Again, this is not due to tree-ch at all. This is due to the code motion passes move invariant load/stores out of the loop. Tree-ch pass just allows those passes to work. All three (gcse, tree pre and tree lim) need to be disabled to see the difference as all three are able to do the transformation.
(In reply to Andrew Pinski from comment #1) > Again, this is not due to tree-ch at all. This is due to the code motion > passes move invariant load/stores out of the loop. Tree-ch pass just allows > those passes to work. > > All three (gcse, tree pre and tree lim) need to be disabled to see the > difference as all three are able to do the transformation. Sorry if I was not clear that tree-ch is not the culprit, but rather that it enabled further optimizations to increase register pressure. But as I added by disabling gcse, tree pre, and tree lim does help total stack usage, but it does not reach on same level as disabling tree-ch.
(In reply to Adhemerval Zanella from comment #2) > (In reply to Andrew Pinski from comment #1) > > Again, this is not due to tree-ch at all. This is due to the code motion > > passes move invariant load/stores out of the loop. Tree-ch pass just allows > > those passes to work. > > > > All three (gcse, tree pre and tree lim) need to be disabled to see the > > difference as all three are able to do the transformation. > > Sorry if I was not clear that tree-ch is not the culprit, but rather that it > enabled further optimizations to increase register pressure. But as I added > by disabling gcse, tree pre, and tree lim does help total stack usage, but > it does not reach on same level as disabling tree-ch. Ok, gcse, tree pre and tree lim are just tree of the flags that are increasing the stack. Other not enabled by Os but enabled by O2 are increasing stack usage. Maybe changing the title to "High stack usage with tree-loop-im, tree-pre, and gcse"?
From a quick look it's a classical testcase for excessive store-motion plus PRE and GCSE managing to do half of that. So in essence there are probably duplicates of this bug and what we miss is something of a register pressure estimation framework on GIMPLE (we do have multiple sketches of that spread across some passes). The main issue here is (as can be seen here) that implementing such estimation in one pass doesn't solve the issue but merely pushes it elsewhere. Note that for i?86 with SSE STV is also an offender: t.c:157:6:mlx5e_grp_sw_update_stats 1376 static t.c:157:6:mlx5e_grp_sw_update_stats 936 static with -mno-stv
Submitted a workaround for the warning that triggered this bug report in the linux kernel: https://lore.kernel.org/lkml/20200104215156.689245-1-arnd@arndb.de/