88405 – Missed DSE opportunity

Bug 88405 - Missed DSE opportunity

Summary: Missed DSE opportunity

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	9.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2018-12-07 14:35 UTC by ktkachov
Modified:	2018-12-10 08:51 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2018-12-10 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description ktkachov 2018-12-07 14:35:40 UTC

For the following code:

#define MATRIX_SIZE 512

static double a[MATRIX_SIZE][MATRIX_SIZE];
static double b[MATRIX_SIZE][MATRIX_SIZE];
static double c[MATRIX_SIZE][MATRIX_SIZE];

double
foo (void) {
  double s;
  int i, j, k;
  /* Section A */
  for (i = 0; i < MATRIX_SIZE; i++) {
    for (j = 0; j < MATRIX_SIZE; j++) {
      a[i][j] = (double)i * (double)j;
      b[i][j] = (double)i / (double)(j+5);
    }
  }
  /* Section B */
  for (j = 0; j < MATRIX_SIZE; j++) {
    for (i = 0; i < MATRIX_SIZE; i++) {
      s = 0;
      for (k = 0; k < MATRIX_SIZE; k++) {
        s += a[i][k] * b[k][j];
      }
      c[i][j] = s;
    }
  }
  s = 0.0; // (1)
#if 0
  /* Section C */
  for (i = 0; i < MATRIX_SIZE; i++) {
    for (j = 0; j < MATRIX_SIZE; j++) {
      s += c[i][j];
    }
  }
#endif
  return s;
}

GCC does not manage to eliminate the code up to (1) and retains the expensive Section A.

Clang manages to eliminate much more and produces:
foo:                                    // @foo
// %bb.0:                               // %entry
        orr     w8, wzr, #0x200
.LBB0_1:                                // %vector.ph
                                        // =>This Inner Loop Header: Depth=1
        subs    x8, x8, #1              // =1
        b.ne    .LBB0_1
// %bb.2:                               // %for.cond20.preheader.preheader
        fmov    d0, xzr
        ret


on aarch64. This happens at -O3 as well as -O2 as well as other targets (occurs also on x86)

Comment 1 Richard Biener 2018-12-10 08:51:40 UTC

Confirmed.

But in general a prerequesite to eliminate Section A is
fusion of both loop nests and then CSEing the loads from a and b and finally
eliminate the stores as dead because a and b are only written to.

For the testcase we eliminate Section B because 'c' is discovered as writeonly.
But we do not update the writeonly flag of a or b when removing uses (and
IMHO we cannot easily).  Note we perform this "DSE" in fixup_cfg.