Bug 68823 - [7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
Summary: [7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 6.0
: P2 normal
Target Milestone: 8.0
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks: graphite
  Show dependency treegraph
 
Reported: 2015-12-09 19:22 UTC by Markus Trippelsdorf
Modified: 2019-11-14 09:12 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 8.0
Known to fail: 7.2.0, 7.5.0
Last reconfirmed: 2017-02-02 00:00:00


Attachments
Isolated graphite dump for miscompiled function (5.14 KB, text/plain)
2017-02-03 11:19 UTC, Martin Liška
Details
patch (829 bytes, patch)
2017-09-14 14:30 UTC, Richard Biener
Details | Diff
attachment-83685-0.dat (141 bytes, message/delivery-status)
2019-11-14 09:12 UTC, postmaster
Details
attachment-83685-1.eml (1.25 KB, message/rfc822)
2019-11-14 09:12 UTC, postmaster
Details
attachment-85110-0.dat (141 bytes, message/delivery-status)
2019-11-14 09:12 UTC, postmaster
Details
attachment-85110-1.eml (1.25 KB, message/rfc822)
2019-11-14 09:12 UTC, postmaster
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Trippelsdorf 2015-12-09 19:22:45 UTC
(with isl-0.15)

  % g++ -floop-nest-optimize -Ofast -w tramp3d-v4.cpp
  % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
Using
  using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
  solving eeq
  time increments from [0, 1.79769e+308], cfl 0.5
  starting at t = 0, i = 1
  cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  face  physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  periodic boundaries in X Y Z
[1]    3243 segmentation fault  ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
Comment 1 Martin Liška 2017-02-02 14:28:02 UTC
Confirmed, started with r227567. I've got ISL 0.16.1.
Comment 2 Richard Biener 2017-02-02 14:29:37 UTC
wrong-code shouldn't be P4 even if it is graphite.
Comment 3 Martin Liška 2017-02-02 16:25:10 UTC
Ok, fails with:

g++ -floop-nest-optimize -Ofast -w tramp3d-v4.cpp -fdump-tree-graphite-details -$ fdbg-cnt=graphite_scop:4

Problem are following side-by-side loops:

[scheduler] original ast:
{
  for (int c0 = 0; c0 <= 2; c0 += 1)
    S_3(c0);
  for (int c0 = 0; c0 <= 2; c0 += 1)
    S_6(c0);
}

[scheduler] AST generated by isl:
{
  for (int c0 = 0; c0 <= 2; c0 += 1)
    S_6(c0);
  for (int c0 = 0; c0 <= 2; c0 += 1)
    S_3(c0);
}

which are interchanged. But due to fact that there's a data dependency:

...
  <bb 3> [75.00%]:
  # i_38 = PHI <i_11(4), 0(2)>
  _34 = (sizetype) i_38;
  _12 = _34 * 8;
  _7 = &retval + _12;
  _8 = s1_4(D) + _12;
  _9 = MEM[(int *)_8];
  MEM[(Element_t[2] &)_7][0] = _9; <----- store to retval
  _10 = MEM[(int *)_8 + 4B];
  MEM[(Element_t[2] &)_7][1] = _10;
  i_11 = i_38 + 1;
  if (i_11 == 3)
    goto <bb 5>; [33.33%]
  else
    goto <bb 4>; [66.67%]

  <bb 4> [50.00%]:
  goto <bb 3>; [100.00%]

  <bb 5> [25.00%]:

  <bb 10> [25.00%]:

  <bb 6> [75.00%]:
  # i_39 = PHI <i_17(7), 0(10)>
  _37 = (sizetype) i_39;
  _35 = _37 * 8;
  _13 = _3(D) + _35;
  _14 = &retval + _35;
  _15 = MEM[(int *)_14];
  MEM[(Element_t[2] &)_13][0] = _15; <------ load from retval, stored to _3+x
  _16 = MEM[(int *)_14 + 4B];
  MEM[(Element_t[2] &)_13][1] = _16;
  i_17 = i_39 + 1;
  if (i_17 == 3)
    goto <bb 8>; [33.33%]
  else
    goto <bb 7>; [66.67%]

  <bb 7> [50.00%]:
  goto <bb 6>; [100.00%]
...

return _3(D);

I can work on that, but any hint where such dependency should be caught?
Comment 4 Sebastian Pop 2017-02-03 06:03:10 UTC
The data dependence relations are dumped in the output of -fdump-tree-graphite-all.
graphite-dependences.c contains the code for the data dependence computations.
Looking at the gimple code it seems like a trivial write after write dependence.

Do we have a reduced testcase for this problem?
Comment 5 Martin Liška 2017-02-03 11:19:01 UTC
Created attachment 40662 [details]
Isolated graphite dump for miscompiled function

As shown in the dump file, there are dependencies for the problematic stmts:

Adding must write to depedence graph: pdr_121 (write 
in gimple stmt: MEM[(Element_t[2] &)_7][0] = _9;
data accesses: { S_3[i2] -> [2, o1, 0] : 8*floor((o1)/8) = o1 and 18446744073709551616*floor((8i2 - o1)/18446744073709551616) = 8i2 - o1 and 0 <= o1 <= 18446744073709551608 }

Adding read to depedence graph: pdr_124 (read 
in gimple stmt: _15 = MEM[(int *)_14];
data accesses: { S_6[i1] -> [2, o1] : 18446744073709551616*floor((-8i1 + o1)/18446744073709551616) = -8i1 + o1 and 0 <= o1 <= 18446744073709551608 }

If I understand the notation correctly, both have equal alias set (2). Do you see Sebastian why the dependence is not caught?

Thanks
Comment 6 Martin Liška 2017-02-03 11:32:05 UTC
And I'm also trying to reduce a self-contained test-case.
Comment 7 Sebastian Pop 2017-02-03 14:20:38 UTC
(In reply to Martin Liška from comment #5)
> Created attachment 40662 [details]
> Isolated graphite dump for miscompiled function
> 
> As shown in the dump file, there are dependencies for the problematic stmts:
> 
> Adding must write to depedence graph: pdr_121 (write 
> in gimple stmt: MEM[(Element_t[2] &)_7][0] = _9;
> data accesses: { S_3[i2] -> [2, o1, 0] : 8*floor((o1)/8) = o1 and
> 18446744073709551616*floor((8i2 - o1)/18446744073709551616) = 8i2 - o1 and 0
> <= o1 <= 18446744073709551608 }
> 
> Adding read to depedence graph: pdr_124 (read 
> in gimple stmt: _15 = MEM[(int *)_14];
> data accesses: { S_6[i1] -> [2, o1] : 18446744073709551616*floor((-8i1 +
> o1)/18446744073709551616) = -8i1 + o1 and 0 <= o1 <= 18446744073709551608 }
> 
> If I understand the notation correctly, both have equal alias set (2). Do
> you see Sebastian why the dependence is not caught?
> 

S_3[i2] -> [2, o1, 0]
S_6[i1] -> [2, o1]

we do not detect the dependence because the two arrays do not have the same number of subscripts: also on the gimple representation we have

MEM[(Element_t[2] &)_7][0] = _9;
vs.
_15 = MEM[(int *)_14];
Comment 8 Sebastian Pop 2017-02-03 14:32:07 UTC
The code in fault is called from pdr_add_memory_accesses()
Maybe the problem is in parsing the gimple MEM[] into a data reference.
Comment 9 Sebastian Pop 2017-02-03 14:38:20 UTC
/* Determines the base object and the list of indices of memory reference
   DR, analyzed in LOOP and instantiated in loop nest NEST.  */

static void
dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop)

This function initializes the subscripts with their access functions:
  DR_ACCESS_FNS (dr) = access_fns;

The number of subscripts (or "dimensions") is then the length of that array:
#define DR_NUM_DIMENSIONS(DR)      DR_ACCESS_FNS (DR).length ()
Comment 10 Richard Biener 2017-02-08 14:57:33 UTC
(In reply to Sebastian Pop from comment #7)
> (In reply to Martin Liška from comment #5)
> > Created attachment 40662 [details]
> > Isolated graphite dump for miscompiled function
> > 
> > As shown in the dump file, there are dependencies for the problematic stmts:
> > 
> > Adding must write to depedence graph: pdr_121 (write 
> > in gimple stmt: MEM[(Element_t[2] &)_7][0] = _9;
> > data accesses: { S_3[i2] -> [2, o1, 0] : 8*floor((o1)/8) = o1 and
> > 18446744073709551616*floor((8i2 - o1)/18446744073709551616) = 8i2 - o1 and 0
> > <= o1 <= 18446744073709551608 }
> > 
> > Adding read to depedence graph: pdr_124 (read 
> > in gimple stmt: _15 = MEM[(int *)_14];
> > data accesses: { S_6[i1] -> [2, o1] : 18446744073709551616*floor((-8i1 +
> > o1)/18446744073709551616) = -8i1 + o1 and 0 <= o1 <= 18446744073709551608 }
> > 
> > If I understand the notation correctly, both have equal alias set (2). Do
> > you see Sebastian why the dependence is not caught?
> > 
> 
> S_3[i2] -> [2, o1, 0]
> S_6[i1] -> [2, o1]
> 
> we do not detect the dependence because the two arrays do not have the same
> number of subscripts: also on the gimple representation we have
> 
> MEM[(Element_t[2] &)_7][0] = _9;
> vs.
> _15 = MEM[(int *)_14];

But then with different number of subscripts (and also likely different
DR_BASE_OBJECT) you can't do anything with them and have to assume
dependence.  See initialize_data_dependence_relation:

  /* If the references do not access the same object, we do not know
     whether they alias or not.  We do not care about TBAA or alignment
     info so we can use OEP_ADDRESS_OF to avoid false negatives.
     But the accesses have to use compatible types as otherwise the
     built indices would not match.  */
  if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF)
      || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
                              TREE_TYPE (DR_BASE_OBJECT (b))))
    {
      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
      return res;

not sure how you communicate that to ISL of course...  is it what you
use "alias-sets" for?  To create extra dependence egdes?
Comment 11 Sebastian Pop 2017-02-08 17:35:40 UTC
(In reply to Richard Biener from comment #10)
> But then with different number of subscripts (and also likely different
> DR_BASE_OBJECT) you can't do anything with them and have to assume
> dependence.  See initialize_data_dependence_relation:
> 
>   /* If the references do not access the same object, we do not know
>      whether they alias or not.  We do not care about TBAA or alignment
>      info so we can use OEP_ADDRESS_OF to avoid false negatives.
>      But the accesses have to use compatible types as otherwise the
>      built indices would not match.  */
>   if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b),
> OEP_ADDRESS_OF)
>       || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
>                               TREE_TYPE (DR_BASE_OBJECT (b))))
>     {
>       DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>       return res;
> 
> not sure how you communicate that to ISL of course...  is it what you
> use "alias-sets" for?  To create extra dependence egdes?

alias-sets differ for two arrays with bases that have been proven to be different.
If they may point to the same thing, they will have the same number.
Comment 12 Richard Biener 2017-07-04 08:46:43 UTC
GCC 6.4 is being released, adjusting target milestone.
Comment 13 Richard Biener 2017-09-14 14:16:43 UTC
We apply blocking on

int a[256][256];

void foo (void)
{
  int *p = &a[4][7];
  for (int i = 0; i < 256; ++i)
    for (int j = 0; j < 256; ++j)
      a[j][i] = a[j][i] * (*(int(*)[1])p)[0];
}

but not when we use a plain indirection, then we get

[scop-detection-fail] Graphite cannot handle data-refs in stmt:
# VUSE <.MEM_20>
_2 = MEM[(int *)&a];

for some reason (stmt_has_simple_data_refs_p fails).  Zero subscripts I guess.
That's a weird thing to look at, even __real MEM[(int *)&a] would make it
one subscript (see dr_analyze_indices).

Means graphite doesn't handle non-indexed memory reads either.

So the above might be a different bug (we're supposed to reject the data-ref?).

We're also blocking

int a[256][256];

void foo (void)
{
  int *p = &a[0][0];
  for (int i = 0; i < 256; ++i)
    for (int j = 0; j < 256; ++j)
      a[j][i] = a[j][i] * p[i];
}

can't wrap my head around whether that's valid or not.

In the end it looks like graphite doesn't handle mixed pointer / non-pointer
accesses because it relies on DR_ACCESS_FNs "matching" for all possible
may-aliases.  I suppose

  /* The access polyhedron contains the polyhedral space this data
     reference will access.

     The polyhedron contains these dimensions:

     - The alias set (a):
     Every memory access is classified in at least one alias set.

     - The subscripts (s_0, ..., s_n):
     The memory is accessed using zero or more subscript dimensions.

     - The iteration domain (variables and parameters)

means that when things are in the same alias set they may still differ
in the number of subscripts?  But in that case they always appear as
not aliasing?

That is, should we, in

static void
build_alias_set (scop_p scop)
{
  int num_vertices = scop->drs.length ();
  struct graph *g = new_graph (num_vertices);
  dr_info *dr1, *dr2;
  int i, j;
  int *all_vertices;

  FOR_EACH_VEC_ELT (scop->drs, i, dr1)
    for (j = i+1; scop->drs.iterate (j, &dr2); j++)
      if (dr_may_alias_p (dr1->dr, dr2->dr, true))
        {
          add_edge (g, i, j);
          add_edge (g, j, i);

when DR_NUM_DIMENSIONS (dr1->dr) != DR_NUM_DIMENSIONS (dr2->dr) better "FAIL"?
Comment 14 Richard Biener 2017-09-14 14:30:15 UTC
Created attachment 42170 [details]
patch

Like this.
Comment 15 Sebastian Pop 2017-09-14 19:00:15 UTC
> when DR_NUM_DIMENSIONS (dr1->dr) != DR_NUM_DIMENSIONS (dr2->dr) better "FAIL"?

Yes.
The patch looks good to me.
Comment 16 Richard Biener 2017-09-15 07:03:28 UTC
Fixed on trunk sofar.
Comment 17 Richard Biener 2017-09-15 07:03:34 UTC
Author: rguenth
Date: Fri Sep 15 07:03:02 2017
New Revision: 252780

URL: https://gcc.gnu.org/viewcvs?rev=252780&root=gcc&view=rev
Log:
2017-09-15  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/68823
	* graphite-scop-detection.c (build_alias_set): If we have a
	possible dependence check whether we can handle them by just
	looking at the DRs DR_ACCESS_FNs.
	(build_scops): If build_alias_set fails, fail the SCOP.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/graphite-scop-detection.c
Comment 18 Martin Liška 2017-09-20 08:02:17 UTC
Good job, tramp3d works now ;)
Comment 19 Jakub Jelinek 2018-10-26 10:09:15 UTC
GCC 6 branch is being closed
Comment 20 Richard Biener 2019-11-14 09:11:50 UTC
Fixed in GCC 8.
Comment 21 postmaster 2019-11-14 09:12:29 UTC Comment hidden (spam)
Comment 22 postmaster 2019-11-14 09:12:30 UTC Comment hidden (spam)
Comment 23 postmaster 2019-11-14 09:12:45 UTC Comment hidden (spam)
Comment 24 postmaster 2019-11-14 09:12:46 UTC Comment hidden (spam)