Bug 36115

Summary: [4.2 Regression] wrong code generated with optimization on x86-64
Product: gcc Reporter: Brett Polivka <brett.polivka>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: fang, gcc-bugs
Priority: P3 Keywords: alias, wrong-code
Version: 4.2.3   
Target Milestone: 4.3.0   
Host: x86_64-linux-gnu Target: x86_64-linux-gnu
Build: x86_64-linux-gnu Known to work: 4.1.3 4.3.0
Known to fail: 4.2.3 4.2.5 Last reconfirmed: 2008-05-03 09:32:16
Attachments: preprocessed output of test program
preprocessed output of test program

Description Brett Polivka 2008-05-02 22:29:53 UTC
This small program:

// built using g++ -o test -O2 main.cpp

#include <iostream>

struct stuff
{
    int x;
};

class MyException : public std::exception
{
  public:

    MyException() { }
};

// make this global so conditional below doesn't get eliminated
bool should_throw = false;

void calc_x(stuff& s, int n)
{
    // set s.x to max(s.x, n, 2)
    s.x = std::max(n, s.x);
    s.x = std::max(2, s.x);

    // bogus throw needed to generate error
    if(should_throw)
    {
        // throw MyException() won't trigger bug - must be separate lines
        // also, something like std::runtime_error won't trigger either
        MyException ex;
        throw ex;
    }
}

int main(int argc, char* argv[])
{
    stuff s = { 0 };

    int n = atoi(argv[1]);

    calc_x(s, n);

    std::cout << s.x << "\n";
    std::cout << (s.x == n ? "SUCCESS" : "FAILURE") << "\n";
}

will fail when passed any value greater than 2.

calc_x should be returning the maximum of s.x, n and 2, but for values of n > 2, always returns the original value of s.x.

Output:
-------------
% ./test 0
2
% ./test 1
2
% ./test 2
2
% ./test 3
0


I've attempted to distill it to a smaller example than this, but eliminating almost anything causes it to start functioning again.

Looking at the generated assembly, gcc is generating two conditional moves, corresponding to the two std::max calls. In the bad code, the final move is moving the address of s.x into a register, which then gets dereferenced and assigned into s.x. However, the intermediate result of the first comparison was not stored in s.x, but a scratch temporary on the stack. Therefore, s.x is being dereferenced and assigned to itself.

        movl    %esi, 12(%rsp)   <--- tmp1 = n
        cmpl    (%rdi), %esi     <--- compare s.x and n
        leaq    12(%rsp), %rax   <--- rax = &tmp1
        cmovl   %rdi, %rax       <--- rax = &s if n < s.x
        movl    (%rax), %edx     <--- edx = *rax
        leaq    28(%rsp), %rax   <--- rax = &tmp2
        movl    $2, 28(%rsp)     <--- tmp2 = 2
        cmpl    $2, %edx
        cmovg   %rdi, %rax       <--- rax = &s.x (!!!) if edx > 2
        cmpb    $0, should_throw(%rip)
        movl    (%rax), %eax     <--- eax = *rax
        movl    %eax, (%rdi)     <--- s.x = eax

This is using gcc 4.2.3 as distributed with Ubuntu 8.04, however I've also verified the same results using an unpatched gcc 4.2.3, as well as the latest gcc-4_2-branch branch from subversion.

Thanks,
Brett Polivka

% g++ -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
Comment 1 Brett Polivka 2008-05-02 22:31:07 UTC
Created attachment 15563 [details]
preprocessed output of test program
Comment 2 Brett Polivka 2008-05-02 22:37:33 UTC
Created attachment 15564 [details]
preprocessed output of test program

Previous version was from wrong code
Comment 3 Richard Biener 2008-05-03 09:32:13 UTC
Confirmed.  On the tree level we get aliasing wrong again:

<bb 2>:
  #   VUSE <n_50>;
  D.30804_7 = n;
  #   VUSE <SMT.294_51>;
  D.30805_8 = s_3->x;
  if (D.30804_7 < D.30805_8) goto <L3>; else goto <L5>;

<L3>:;
  __b_9 = &s_3->x;

  # __b_2 = PHI <__b_9(3), &n(2)>;
<L5>:;
  #   VUSE <n_50>;
  #   VUSE <D.30570_14>;
  D.30813_12 = *__b_2;
  #   D.30570_15 = V_MUST_DEF <D.30570_14>;
  D.30570 = 2;
  if (D.30813_12 > 2) goto <L6>; else goto <L8>;

<L6>:;
  __b_20 = &s_3->x;

  # __b_1 = PHI <__b_20(5), &D.30570(4)>;
<L8>:;
  #   VUSE <n_50>;
  #   VUSE <D.30570_15>;
  D.30636_23 = *__b_1;

DSE deleted the intermediate store to s_3->x:

   #   VUSE <n_50>;
   #   VUSE <D.30570_14>;
   D.30813_12 = *__b_2;
-  #   SMT.294_52 = V_MAY_DEF <SMT.294_51>;
-  s_3->x = D.30813_12;
   #   D.30570_15 = V_MUST_DEF <D.30570_14>;
   D.30570 = 2;

thus, a workaround is -fno-tree-dse.
Comment 4 Joseph S. Myers 2008-05-19 20:25:21 UTC
4.2.4 is being released, changing milestones to 4.2.5.
Comment 5 Joseph S. Myers 2009-03-31 15:37:54 UTC
Closing 4.2 branch, fixed in 4.3.