Bug 49651 - [4.4 Regression] nested lambdas and -O3 produced incorrect integer variable increments
Summary: [4.4 Regression] nested lambdas and -O3 produced incorrect integer variable i...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.1
: P2 normal
Target Milestone: 4.5.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2011-07-05 20:14 UTC by Pedro Larroy
Modified: 2012-03-13 13:30 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.3.6, 4.5.4, 4.6.2, 4.7.0
Known to fail: 4.4.6, 4.5.3, 4.6.1
Last reconfirmed: 2011-07-05 23:53:37


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pedro Larroy 2011-07-05 20:14:58 UTC
Hi. I'm trying to reproduce this outside 'production' code but haven't been successful yet. But the observation is the following:

I have 3 level nested lambdas iterating some vectors. Inside, printing and incrementing this 'sentinel' variable, the output with -O3 shows the wrong output:

0. sentinel: deadbee5
2. sentinel: deadbee6
1. sentinel: deadbee7
1. sentinel: deadbee8
3. sentinel: deadbee7
4. sentinel: deadbee8
1. sentinel: deadbee9
5. sentinel: deadbee9
6. sentinel: deadbeea
2. sentinel: deadbeeb

With -O0 the values are consecutive. The code is similar to:

u32 sentinel=0xdeadbee0;
auto run = [&](Class& c) {
   for_each(v3.begin(), v3.end(), [&](ClassC& cc) {
       print(sentinel++);
   });
   print(sentinel++);
};
for_each(v.begin(), v.end(), [&](ClassB& b) {

   for_each(v2.begin(), v2.end(), run);
   print(sentinel++);
});


I have observed that the sentinel variable for example, is captured inside the lambda and the address is not the same as the outermost sentinel...   could that be the problem that leads to the wrong optimizations?
Comment 1 Pedro Larroy 2011-07-05 20:30:13 UTC
I think I'm able to reproduce with the following:

the output spotted is:


run.for_each.a: 20633
run.a: 20634
1 for_each.a: 20631
1 for_each.a: 20632

Compiled with:
g++ -Wall -std=c++0x -O3 -o test test.cc

Code follows:

#include <iostream>
#include <string>
#include <cstdlib>
#include <cassert>
#include <vector>
#include <stdexcept>
#include <stdint.h>
#include <algorithm>
#include <stdint.h>

typedef uint32_t u32;
using namespace std;
vector<u32> g;
void f(u32 a, u32 b) 
{   
    g.push_back(b);
}

int main(int argc, char *argv[])
{
    u32 a = 0;
    vector<int> vi;
    vi.push_back(0);
    vi.push_back(1);
    vi.push_back(2);
    vi.push_back(3);
    vi.push_back(4);


    vector<int> vo;
    vo.push_back(5);
    vo.push_back(5);

    vector<int> ve;
    ve.push_back(5);


    vector<int> v3;
    vector<int> v4;
    vector<int> v5;
    vector<int> v6;
    vector<int> v7;

    f(32, a++);
    cout << "bmain.a: " << a << endl;

    f(32, a++);
    cout << "bmain.a: " << a << endl;


    auto run = [&](int& i) {
        // inside this lambda &a is not the same as the &a in the first line
        v3.push_back(i);
        v4.push_back(i);
        v5.push_back(i);
        v6.push_back(i);
        v7.push_back(i);
        f(32, a++);
        cout << "run.a: " << a << endl;

        v3.push_back(i);
        v4.push_back(i);
        v5.push_back(i);
        v6.push_back(i);
        v7.push_back(i);
        f(32, a++);
        cout << "run.a: " << a << endl;



        for_each(ve.begin(), ve.end(), [&](int& xi) {
            v7.push_back(xi);
            f(32, a++);
            cout << "run.for_each.a: " << a << endl;

        });

        f(32, a++);
        cout << "run.a: " << a << endl;
    };

    f(32, a++);
    cout << "main.a: " << a << endl;

    while(true) {
    for_each(vi.begin(), vi.end(), [&](int& xi) {
        f(32, a++);
        cout << "1 for_each.a: " << a << endl;


        for_each(vo.begin(), vo.end(), run);

        f(32, a++);
        cout << "1 for_each.a: " << a << endl;


        for_each(ve.begin(), ve.end(), run);

        f(32, a++);
        cout << "1 for_each.a: " << a << endl;


    });
    }
    cout << "main.a: " << a << endl;
}
Comment 2 Jonathan Wakely 2011-07-05 21:54:39 UTC
Can you produce a testcase that aborts/fails if the problem occurs? Otherwise I seem to need to inspect thousands of lines to look for non-consecutive values!

Could this be the same underlying issue as PR 49598 and so (very recently) fixed?
Comment 3 Pedro Larroy 2011-07-05 22:24:26 UTC
(In reply to comment #2)
> Can you produce a testcase that aborts/fails if the problem occurs? Otherwise I
> seem to need to inspect thousands of lines to look for non-consecutive values!
> 
> Could this be the same underlying issue as PR 49598 and so (very recently)
> fixed?

It doesn't look the same to me since the variable doesn't have garbage, but I don't have enough insight to judge. Can you point to which particular source file to check to investigate this?

Here is the program that asserts on error. Works with -O0 and fails with -O3 quite soon (12 iterations):


g++-4.6 -Wall -std=c++0x -O3 -o test test.cc




#include <iostream>
#include <string>
#include <cstdlib>
#include <cassert>
#include <vector>
#include <stdexcept>
#include <stdint.h>
#include <algorithm>
#include <stdint.h>

typedef uint32_t u32;
using namespace std;
vector<u32> g;
void f(u32 a, u32 b) 
{   
    g.push_back(b);
    for(size_t i=1; i<g.size() && ! g.empty(); ++i) {
        if ( g[i-1]+1 != g[i]) {
            cerr << g[i-1]+1 << " != " << g[i] << endl;
            cerr << "g: " << endl;
            for(size_t i=0; i<g.size() && ! g.empty(); ++i)
                cerr << g[i] << endl;
            assert(0);
        }
    }
}

int main(int argc, char *argv[])
{
    u32 a = 0;
    vector<int> vi;
    vi.push_back(0);
    vi.push_back(1);
    vi.push_back(2);
    vi.push_back(3);
    vi.push_back(4);


    vector<int> vo;
    vo.push_back(5);
    vo.push_back(5);

    vector<int> ve;
    ve.push_back(5);


    vector<int> v3;
    vector<int> v4;
    vector<int> v5;
    vector<int> v6;
    vector<int> v7;

    f(32, a++);
    cout << "bmain.a: " << a << endl;

    f(32, a++);
    cout << "bmain.a: " << a << endl;


    auto run = [&](int& i) {
        // inside this lambda &a is not the same as the &a in the first line
        v3.push_back(i);
        v4.push_back(i);
        v5.push_back(i);
        v6.push_back(i);
        v7.push_back(i);
        f(32, a++);
        cout << "run.a: " << a << endl;

        v3.push_back(i);
        v4.push_back(i);
        v5.push_back(i);
        v6.push_back(i);
        v7.push_back(i);
        f(32, a++);
        cout << "run.a: " << a << endl;



        for_each(ve.begin(), ve.end(), [&](int& xi) {
            v7.push_back(xi);
            f(32, a++);
            cout << "run.for_each.a: " << a << endl;

        });

        f(32, a++);
        cout << "run.a: " << a << endl;
    };

    f(32, a++);
    cout << "main.a: " << a << endl;

    while(true) {
    for_each(vi.begin(), vi.end(), [&](int& xi) {
        f(32, a++);
        cout << "1 for_each.a: " << a << endl;


        for_each(vo.begin(), vo.end(), run);

        f(32, a++);
        cout << "1 for_each.a: " << a << endl;


        for_each(ve.begin(), ve.end(), run);

        f(32, a++);
        cout << "1 for_each.a: " << a << endl;


    });
    }
    cout << "main.a: " << a << endl;
}
Comment 4 Jonathan Wakely 2011-07-05 23:53:37 UTC
thanks, that doesn't assert for me using the latest 4.7.0 but does for 4.6.2, so it's not PR 49598
Comment 5 Jason Merrill 2011-07-12 14:38:53 UTC
With current pre-4.6.2 I find that it fails with

-O2 -fno-thread-jumps -fno-align-functions -fno-align-jumps -fno-align-loops  -fno-align-labels          -fno-caller-saves          -fno-crossjumping           -fno-cse-follow-jumps  -fno-cse-skip-blocks          -fno-delete-null-pointer-checks          -fno-expensive-optimizations          -fno-gcse  -fno-gcse-lm          -fno-inline-small-functions          -fno-indirect-inlining          -fno-ipa-sra          -fno-optimize-sibling-calls          -fno-peephole2          -fno-regmove          -fno-reorder-blocks  -fno-reorder-functions          -fno-rerun-cse-after-loop          -fno-sched-interblock  -fno-sched-spec          -fno-schedule-insns  -fno-schedule-insns2          -fno-strict-aliasing -fno-strict-overflow          -fno-tree-switch-conversion          -fno-tree-pre          -fno-tree-vrp  -fno-devirtualize -fno-ipa-cp -fno-tree-builtin-call-dce  -fno-auto-inc-dec -fno-cprop-registers -fno-dce -fno-defer-pop -fno-delayed-branch -fno-dse -fno-guess-branch-probability -fno-if-conversion -fno-if-conversion2 -fno-ipa-pure-const -fno-ipa-reference -fno-merge-constants -fno-split-wide-types -fno-tree-builtin-call-dce -fno-tree-ccp -fno-tree-ch -fno-tree-copyrename -fno-tree-dce -fno-tree-forwprop -fno-tree-phiprop -fno-tree-sra -fno-tree-ter

But adding -O1, -fno-tree-pta or -fno-tree-fre makes it work.

There only seem to be a few places that depend on optimize being 1 vs 2.

Changing category to tree-optimization.
Comment 6 Richard Biener 2011-07-12 15:19:48 UTC
Shorter testcase that also terminates when successful:

#include <vector>
#include <algorithm>

extern "C" void abort (void);

typedef unsigned int u32;
using namespace std;
vector<u32> g;
void f(u32 a, u32 b)
{
  g.push_back(b);
  for(int i=1; i<g.size() && ! g.empty(); ++i)
    if ( g[i-1]+1 != g[i])
      abort ();
}

int main(int argc, char *argv[])
{
  int cnt = 256;
  u32 a = 0;
  vector<int> vi;
  vi.push_back(0);
  vi.push_back(1);
  vi.push_back(2);
  vi.push_back(3);
  vi.push_back(4);
  vector<int> vo;
  vo.push_back(5);
  vo.push_back(5);
  vector<int> ve;
  ve.push_back(5);
  vector<int> v3;
  vector<int> v4;
  vector<int> v5;
  vector<int> v6;
  vector<int> v7;

  f(32, a++);
  f(32, a++);

  auto run = [&](int& i) {
      // inside this lambda &a is not the same as the &a in the first line
      v3.push_back(i);
      v4.push_back(i);
      v5.push_back(i);
      v6.push_back(i);
      v7.push_back(i);
      f(32, a++);

      v3.push_back(i);
      v4.push_back(i);
      v5.push_back(i);
      v6.push_back(i);
      v7.push_back(i);
      f(32, a++);

      for_each(ve.begin(), ve.end(), [&](int& xi) {
               v7.push_back(xi);
               f(32, a++);
               });

      f(32, a++);
  };

  f(32, a++);

  while(--cnt) {
      for_each(vi.begin(), vi.end(), [&](int& xi) {
               f(32, a++);
               for_each(vo.begin(), vo.end(), run);
               f(32, a++);
               for_each(ve.begin(), ve.end(), run);
               f(32, a++);
               });
  }
}
Comment 7 Richard Biener 2011-07-13 12:31:13 UTC
Further reduced (-O2 vs. -O1):

#include <vector>
#include <algorithm>

extern "C" void abort (void);

typedef unsigned int u32;
using namespace std;
vector<u32> g;
void f(u32 b)
{
  g.push_back(b);
  for(int i=1; i<g.size() && ! g.empty(); ++i)
    if ( g[i-1]+1 != g[i])
      abort ();
}

int main(int argc, char *argv[])
{
  u32 a = 0;
  vector<int> vi;
  vi.push_back(0);
  vector<int> ve;
  ve.push_back(5);
  vector<int> v7;

  auto run = [&](int& i) {
      v7.push_back(i);
      f(a++);
  };

  for_each(vi.begin(), vi.end(), [&](int& xi) {
           f(a++);
           for_each(ve.begin(), ve.end(), run);
           f(a++);
           for_each(ve.begin(), ve.end(), run);
           });
}
Comment 8 Richard Biener 2011-07-14 12:26:08 UTC
In the optimized dump I can see

<bb 4>:
...
  run.__v7 = &v7;
  run.__a = &a;
...

<bb 6>:
  # ivtmp.207_17 = PHI <ivtmp.207_68(10), ivtmp.207_69(5)>
  D.47092_58 = a;
  D.47093_59 = D.47092_58 + 1;
  a = D.47093_59;
  f (D.47092_58);

<bb 7>:
  D.47084_60 = MEM[(int * const &)&ve + 8];
  D.47090._M_current = D.47084_60;
  D.47083_61 = MEM[(int * const &)&ve];
  D.47089._M_current = D.47083_61;
  D.47091 = std::for_each<__gnu_cxx::__normal_iterator<int*, std::vector<int> >, main(int, char**)::<lambda(int&)> > (D.47089, D.47090, run);

<bb 8>:
  D.47093_63 = D.47092_58 + 2;
  a = D.47093_63;
  f (D.47093_59);

<bb 9>:
  D.47082_64 = MEM[(int * const &)&ve + 8];
  D.47086._M_current = D.47082_64;
  D.47081_65 = MEM[(int * const &)&ve];
  D.47085._M_current = D.47081_65;
  D.47087 = std::for_each<__gnu_cxx::__normal_iterator<int*, std::vector<int> >, main(int, char**)::<lambda(int&)> > (D.47085, D.47086, run);

<bb 10>:
  ivtmp.207_68 = ivtmp.207_17 + 4;
  __first$_M_current_66 = (int *) ivtmp.207_68;
  if (__last_27 != __first$_M_current_66)
    goto <bb 6>;
  else
...

so we do CSE a over the std::for_each invocation which gets run as argument.
That's wrong of course.  Alias-info tells us even that a is clobbered by
that call:

  # USE = nonlocal null { aD.40068 viD.40071 D.40875 veD.40876 D.40877 v7D.40878 }
  # CLB = nonlocal null { aD.40068 viD.40071 D.40875 veD.40876 D.40877 v7D.40878 }
  D.47091 = std::for_each<__gnu_cxx::__normal_iterator<int*, std::vector<int> >, main(int, char**)::<lambda(int&)> > (D.47089, D.47090, runD.40880);

but it doesn't do so at the time when the first FRE is run:

  # USE = nonlocal null { viD.40071 D.40875 veD.40876 D.40877 v7D.40878 }
  # CLB = nonlocal null { viD.40071 D.40875 veD.40876 D.40877 v7D.40878 }
  D.47091 = std::for_each<__gnu_cxx::__normal_iterator<int*, std::vector<int> >, main(int, char**)::<lambda(int&)> > (D.47089, D.47090, *__f$__runD.47077_57);

the setup there is still not very optimized though.
Comment 9 Richard Biener 2011-07-14 12:36:23 UTC
C testcase that fails at -O2:

extern void abort (void);

struct X {
    int *p;
    int *q;
};

void __attribute__((noinline, noclone))
foo (struct X x) { *x.q = 0; }

volatile int what;
struct X y;

int main()
{
  int i, j;
  struct X x, *p;
  x.p = &i;
  x.q = &j;
  if (what)
    p = &y;
  else
    p = &x;
  j = 1;
  foo (*p);
  if (j != 0)
    abort ();
  return 0;
}

workaround: --param max-fields-for-field-sensitive=1
Comment 10 Richard Biener 2011-07-14 13:02:11 UTC
I have a patch.
Comment 11 Richard Biener 2011-07-14 14:53:33 UTC
Author: rguenth
Date: Thu Jul 14 14:53:30 2011
New Revision: 176274

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176274
Log:
2011-07-14  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/49651
	* tree-ssa-structalias.c (get_constraint_for_1): Properly
	handle dereferences with subvariables.

	* gcc.dg/torture/pr49651.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr49651.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-structalias.c
Comment 12 Richard Biener 2011-07-14 14:56:30 UTC
Author: rguenth
Date: Thu Jul 14 14:56:27 2011
New Revision: 176275

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176275
Log:
2011-07-14  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/49651
	* tree-ssa-structalias.c (get_constraint_for_1): Properly
	handle dereferences with subvariables.

	* gcc.dg/torture/pr49651.c: New testcase.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/torture/pr49651.c
Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/gcc/tree-ssa-structalias.c
Comment 13 Richard Biener 2011-07-14 15:00:45 UTC
Fixed for 4.6.2 and trunk sofar.
Comment 14 Pedro Larroy 2011-08-01 15:59:23 UTC
(In reply to comment #13)
> Fixed for 4.6.2 and trunk sofar.

great!  thanks.
Comment 15 Richard Biener 2012-01-04 09:47:18 UTC
Author: rguenth
Date: Wed Jan  4 09:47:12 2012
New Revision: 182865

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182865
Log:
2012-01-04  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/49651
	* tree-ssa-structalias.c (type_can_have_subvars): New function.
	(var_can_have_subvars): Use it.
	(get_constraint_for_1): Only consider subfields if there
	can be any.

	* gcc.dg/tree-ssa/pta-ptrarith-1.c: Adjust.
	* gcc.dg/tree-ssa/pta-ptrarith-2.c: Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pta-ptrarith-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pta-ptrarith-2.c
    trunk/gcc/tree-ssa-structalias.c
Comment 16 Richard Biener 2012-01-04 09:50:18 UTC
Author: rguenth
Date: Wed Jan  4 09:50:13 2012
New Revision: 182866

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182866
Log:
2012-01-04  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/49651
	* tree-ssa-structalias.c (type_can_have_subvars): New function.
	(var_can_have_subvars): Use it.
	(get_constraint_for_1): Only consider subfields if there
	can be any.

	* gcc.dg/tree-ssa/pta-ptrarith-1.c: Adjust.
	* gcc.dg/tree-ssa/pta-ptrarith-2.c: Likewise.

Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/tree-ssa/pta-ptrarith-1.c
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/tree-ssa/pta-ptrarith-2.c
    branches/gcc-4_6-branch/gcc/tree-ssa-structalias.c
Comment 17 Richard Biener 2012-01-04 11:54:25 UTC
Author: rguenth
Date: Wed Jan  4 11:54:18 2012
New Revision: 182870

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182870
Log:
2012-01-04  Richard Guenther  <rguenther@suse.de>

	Backport from mainline
	2012-01-04  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/49651
	* tree-ssa-structalias.c (type_can_have_subvars): New function.
	(var_can_have_subvars): Use it.
	(get_constraint_for_1): Only consider subfields if there
	can be any.

	2011-07-14  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/49651
	* tree-ssa-structalias.c (get_constraint_for_1): Properly
	handle dereferences with subvariables.

	* gcc.dg/torture/pr49651.c: New testcase.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/torture/pr49651.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_5-branch/gcc/tree-ssa-structalias.c
Comment 18 Richard Biener 2012-01-04 11:56:05 UTC
Fixed for 4.5 as well.  I am not considering 4.4 at this moment.
Comment 19 Jakub Jelinek 2012-03-13 13:30:23 UTC
Fixed in 4.5+, 4.4 is no longer supported.