[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed Feb 24 09:20:14 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Mel Chen from comment #8)
> Sorry for using the bad example to describe the problem I am facing. Let me
> clarify my question with a more precise example.
> 
> void array_mul(int N, int *C, short *A, short *B) {
>   int i, j;
>   for (i = 0; i < N; i++) {
>     C[i] = 0; // Will be transformed to __builtin_memset
>     for (j = 0; j < N; j++) {
>       C[i] += (int)A[i * N + j] * (int)B[j];
>     }
>   }
> }
> 
> If I compile the case with -O2 -fno-tree-loop-distribute-patterns, the store
> operation 'C[i] = 0' can be eliminated by dead store elimination (dse3). But
> without -fno-tree-loop-distribute-patterns, it will be transformed to memset
> by loop distribution (ldist) because ldist executes before dse3. Finally the
> memset will not be eliminated.
> 
> Another point is if there are other operations in the same level loop as the
> store operation, is it really beneficial to do loop distribution and then
> convert to builtin function?

Sure, it shows a cost modeling issue given that usually loop distribution
merges partitions which touch the same memory stream (but IIRC maybe only
for loads).  But more to the point we're missing to eliminate the dead store
which should be appearant at least after PRE - LIM2 applied store motion
but only PRE elides the resulting load of C[i].  Usually DCE and DSE come in
pairs but after PRE we have DCE, CDDCE w/o accompaning DSE only with the
next DSE only happening after loop distribution.

Which means we should eventually do

diff --git a/gcc/passes.def b/gcc/passes.def
index e9ed3c7bc57..be3a9becde0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -254,6 +254,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_sancov);
       NEXT_PASS (pass_asan);
       NEXT_PASS (pass_tsan);
+      NEXT_PASS (pass_dse);
       NEXT_PASS (pass_dce);
       /* Pass group that runs when 1) enabled, 2) there are loops
         in the function.  Make sure to run pass_fix_loops before


More information about the Gcc-bugs mailing list