Bug 77647

Summary: [6/7 Regression] Missed opportunity to use register value causes additional load
Product: gcc Reporter: Nicholas Piggin <npiggin>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED DUPLICATE    
Severity: enhancement CC: jeffreyalaw, law, rguenth, segher
Priority: P3 Keywords: missed-optimization
Version: 6.2.0   
Target Milestone: 6.3   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71947
Host: Target:
Build: Known to work: 5.4.0
Known to fail: Last reconfirmed: 2016-09-19 00:00:00

Description Nicholas Piggin 2016-09-19 14:32:02 UTC
static inline long load(long *p)
{
        long ret;
        asm ("movq      %1,%0\n\t" : "=r" (ret) : "m" (*p));
        if (ret != *p)
                __builtin_unreachable();
        return ret;
}

long foo(long *mem)
{
        long ret;
        ret = load(mem);
        return ret - *mem;
}

foo() compiles down to 'xorl %eax,%eax ; ret' which is great. Changing the minus to plus gives

        movq    (%rdi),%rax
        addq    (%rdi),%rax
        ret
Comment 1 Andrew Pinski 2016-09-19 16:43:23 UTC
In the case of -, during VPR (or DOM), folds the - to 0.  While in the + case, it does not pick up it should be a + a instead of a + b.

Confirmed.
Comment 2 Richard Biener 2016-09-20 08:14:04 UTC
DOM doesn't propagate the equivalence on the not unreachable edge, that is,
it doesn't copy-propagate because of PR71947

  if (ret_3 != _5)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  __builtin_unreachable ();

  <bb 4>:
  _4 = ret_3 + _5;

DOM does the following after Jeffs patch:

Optimizing block #4

1>>> STMT 1 = ret_3 le_expr _5
1>>> STMT 1 = ret_3 ge_expr _5
1>>> STMT 1 = ret_3 eq_expr _5
1>>> STMT 0 = ret_3 ne_expr _5
0>>> COPY ret_3 = _5
0>>> COPY _5 = ret_3
Optimizing statement _4 = ret_3 + _5;
  Replaced 'ret_3' with variable '_5'
  Replaced '_5' with variable 'ret_3'
  Folded to: _4 = ret_3 + _5;

which is of course a quite stupid replacement.  I still believe the change
should be reverted ... Jeff, are you still investigating this?

Works on the GCC 5 branch:

Optimizing block #4

0>>> COPY _5 = ret_3
1>>> STMT 1 = ret_3 le_expr _5
1>>> STMT 1 = ret_3 ge_expr _5
1>>> STMT 1 = ret_3 eq_expr _5
1>>> STMT 0 = ret_3 ne_expr _5
Optimizing statement _4 = ret_3 + _5;
  Replaced '_5' with variable 'ret_3'
LKUP STMT _4 = ret_3 plus_expr ret_3
2>>> STMT _4 = ret_3 plus_expr ret_3

with assembly

foo:
.LFB1:
        .cfi_startproc
#APP
# 4 "t.c" 1
        movq      (%rdi),%rax

# 0 "" 2
#NO_APP
        addq    %rax, %rax
        ret
Comment 3 Jeffrey A. Law 2016-10-05 19:47:47 UTC
Fundamentally the same issue as 71947.

*** This bug has been marked as a duplicate of bug 71947 ***