Bug 88405 - Missed DSE opportunity
Summary: Missed DSE opportunity
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-12-07 14:35 UTC by ktkachov
Modified: 2018-12-10 08:51 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-12-10 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ktkachov 2018-12-07 14:35:40 UTC
For the following code:

#define MATRIX_SIZE 512

static double a[MATRIX_SIZE][MATRIX_SIZE];
static double b[MATRIX_SIZE][MATRIX_SIZE];
static double c[MATRIX_SIZE][MATRIX_SIZE];

double
foo (void) {
  double s;
  int i, j, k;
  /* Section A */
  for (i = 0; i < MATRIX_SIZE; i++) {
    for (j = 0; j < MATRIX_SIZE; j++) {
      a[i][j] = (double)i * (double)j;
      b[i][j] = (double)i / (double)(j+5);
    }
  }
  /* Section B */
  for (j = 0; j < MATRIX_SIZE; j++) {
    for (i = 0; i < MATRIX_SIZE; i++) {
      s = 0;
      for (k = 0; k < MATRIX_SIZE; k++) {
        s += a[i][k] * b[k][j];
      }
      c[i][j] = s;
    }
  }
  s = 0.0; // (1)
#if 0
  /* Section C */
  for (i = 0; i < MATRIX_SIZE; i++) {
    for (j = 0; j < MATRIX_SIZE; j++) {
      s += c[i][j];
    }
  }
#endif
  return s;
}

GCC does not manage to eliminate the code up to (1) and retains the expensive Section A.

Clang manages to eliminate much more and produces:
foo:                                    // @foo
// %bb.0:                               // %entry
        orr     w8, wzr, #0x200
.LBB0_1:                                // %vector.ph
                                        // =>This Inner Loop Header: Depth=1
        subs    x8, x8, #1              // =1
        b.ne    .LBB0_1
// %bb.2:                               // %for.cond20.preheader.preheader
        fmov    d0, xzr
        ret


on aarch64. This happens at -O3 as well as -O2 as well as other targets (occurs also on x86)
Comment 1 Richard Biener 2018-12-10 08:51:40 UTC
Confirmed.

But in general a prerequesite to eliminate Section A is
fusion of both loop nests and then CSEing the loads from a and b and finally
eliminate the stores as dead because a and b are only written to.

For the testcase we eliminate Section B because 'c' is discovered as writeonly.
But we do not update the writeonly flag of a or b when removing uses (and
IMHO we cannot easily).  Note we perform this "DSE" in fixup_cfg.