Bug 37810 - Bad store sinking job
Summary: Bad store sinking job
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks:
 
Reported: 2008-10-12 15:13 UTC by Carlo Wood
Modified: 2009-04-03 12:34 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-04-03 12:34:44


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Carlo Wood 2008-10-12 15:13:31 UTC
The following code snippet:

void g();

struct A {
  int n;
  int m;

  A& operator++(void)
  {
    if (__builtin_expect(n == m, false))
      g();
    else
      ++n;
    return *this;
  }

  A() : n(0), m(0) { }

  friend bool operator!=(A const& a1, A const& a2) { return a1.n != a2.n; }
};

void testfunction(A& iter)
{
  A const end;
  while (iter != end)
    ++iter;
}

Results in the following assembly code, using maximum optimization:

        movl    (%rdi), %eax
        jmp     .L6

.L4:
        cmpl    %eax, 4(%rdi)     // n == m ?
        je      .L8               // unlikely jump
        addl    $1, %eax          // ++n
        movl    %eax, (%rdi)      // *** store result to memory ***
.L6:
        testl   %eax, %eax        // iter != end ?
        jne     .L4               // continue while loop


The storing (back) of %eax to (%rdi) remains inside the inner
loop no matter what I try. It could/should be moved outside
the loop, since nothing inside the L4 loop is accessing (%rdi)
or could possibly be accessing that memory.

This loop has two exits: below the last jne .L4, and the
jump to .L8. The store could be sinked to both exits.
This grows the code, but it seems reasonable to do for
a loop with a very small body, especially if one of the
exits is marked as unlikely :p.
Comment 1 Richard Biener 2008-10-12 15:20:19 UTC
store-sinking doesn't do its job because it thinks that

Memory reference 0: iter_1(D)->n
Memory reference 1: iter_1(D)->m
...
Querying dependencies of ref 0 in loop 1: dependent
Comment 2 Richard Biener 2008-10-12 15:25:41 UTC
The original testcase (from an IRC discussion) reduced to a C testcase is:

struct A {
  int n;
  int m;
};

void g();

void test (struct A* iter)
{
  struct A end = { 0, 0 };
  while (iter->n != end.n)
    {
      iter->n = iter->n + 1;
      if (iter->n == iter->m)
        g();
    }
}

where there is an optimization possibility to sink the store to iter->n to
before the call and apply load-store motion to iter->n for the remaining loop.
Comment 3 Richard Biener 2008-10-12 15:29:17 UTC
It looks like the testcase in comment #2 should be fixed by SSUPRE?  We have

  *p = ...;
  if ()
    foo();

where foo() is an "implicit" store to *p.  Still store sinking should be applied
to the subloop.
Comment 4 Carlo Wood 2008-10-12 15:32:56 UTC
Note that the original code was:

  A& operator++(void)
  {
    ++n;
    if (__builtin_expect(n == m, false))
      g();
    return *this;
  }

but g++ fails to optimize that by decrementing m outside
the loop (so I'm decrementing m myself now and use the
former code). The former code has as advantage, namely,
that the result of the addl $1,%eax can be used for the
conditional jump. However, gcc ALSO doesn't do that: in
the above assembly it follows the add with a redundant
testl %eax,%eax.

Anyway, using the operator++ given in this comment,
the assembly code is:

        movl    (%rdi), %eax
        jmp     .L3

.L4:
        addl    $1, %eax
        cmpl    4(%rdi), %eax
        movl    %eax, (%rdi)
        je      .L8
.L3:
        testl   %eax, %eax
        jne     .L4

which is essentially the same, except now the
testl %eax,%eax is indeed "needed" ...
Comment 5 Richard Biener 2009-04-03 12:34:44 UTC
Re-confirmed.