Bug 36008 - [4.3/4.4 Regression] Function produces wrong results when inlined.
Summary: [4.3/4.4 Regression] Function produces wrong results when inlined.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.1
: P3 normal
Target Milestone: 4.3.1
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2008-04-22 13:50 UTC by Xavier Andrade
Modified: 2008-04-24 16:32 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-unknown-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-04-23 11:02:01


Attachments
the code (1.39 KB, text/plain)
2008-04-22 13:53 UTC, Xavier Andrade
Details
Test case, 1st file (no includes) (525 bytes, text/plain)
2008-04-22 22:49 UTC, Xavier Andrade
Details
Test case 2nd file (225 bytes, text/plain)
2008-04-22 22:49 UTC, Xavier Andrade
Details
simplified bravais.c (530 bytes, text/plain)
2008-04-23 10:03 UTC, Richard Biener
Details
gcc43-pr36008.patch (647 bytes, patch)
2008-04-24 11:29 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xavier Andrade 2008-04-22 13:50:49 UTC
When the attached source file is compiled with 'gcc -O3 -c', the code that uses it produces wrong results. The problem disappears if 'gcc -O3 -fno-inline -c' or if the variables inside 'generate_point_symmetry' are declared as 'static'.

This is the output of gcc-4.3 -v :

Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.3/configure --prefix=/opt/gcc/ --program-suffix=-4.3 --enable-languages=c,fortran,c++ --with-arch=core2 --enable-libgomp
Thread model: posix
gcc version 4.3.1 20080419 (prerelease) (GCC) 

This is the system:

Linux corvo 2.6.24.2-corvo-001 #3 SMP PREEMPT x86_64 GNU/Linux
distribution is Debian GNU/Linux 4.0
Comment 1 Xavier Andrade 2008-04-22 13:53:05 UTC
Created attachment 15510 [details]
the code
Comment 2 Bernhard Reutner-Fischer 2008-04-22 14:13:44 UTC
Please provide a self-contained testcase (see http://gcc.gnu.org/bugs.html ) that ideally abort()s on a wrong result.
Comment 3 Xavier Andrade 2008-04-22 17:16:49 UTC
The code comes from spglib, a library to calculate symmetry groups from crystals, so it is quite complex. The problem is that I didn't wrote it I don't understand it enough to be able to produce a small self-contained test case. I will try to do it, but this may take me some time.

Comment 4 Xavier Andrade 2008-04-22 22:49:02 UTC
Created attachment 15513 [details]
Test case, 1st file (no includes)
Comment 5 Xavier Andrade 2008-04-22 22:49:29 UTC
Created attachment 15514 [details]
Test case 2nd file
Comment 6 Xavier Andrade 2008-04-22 22:51:56 UTC
I have managed to create a test case:

Correct case:

xavier@corvo:~$ gcc-4.3 bravais.c mathfunc.c -O3 -fno-inline
bravais.c: In function ‘main’:
bravais.c:83: warning: incompatible implicit declaration of built-in function ‘printf’
xavier@corvo:~$ ./a.out 
0 

Wrong case:

xavier@corvo:~$ gcc-4.3 bravais.c mathfunc.c -O3
bravais.c: In function ‘main’:
bravais.c:83: warning: incompatible implicit declaration of built-in function ‘printf’
xavier@corvo:~$ ./a.out 
-1 

Sorry for using two files, but the problem disappears if all functions are in a single file.
Comment 7 Richard Biener 2008-04-23 10:03:27 UTC
Created attachment 15515 [details]
simplified bravais.c

gcc -c mathfunc.c
gcc -o t.ok bravais.c mathfunc.o -O
gcc -o t.fail bravais.c mathfunc.o -O -funroll-loops

./t.ok
./t.fail
Aborted
Comment 8 Richard Biener 2008-04-23 11:02:01 UTC
This goes wrong somewhere during RTL optimization.
Comment 9 Jakub Jelinek 2008-04-23 16:35:32 UTC
Even more simplified testcase, with just one CU.  Works at -O0/-O/-O2, fails at
-O{,2} -funroll-loops or -O3.
extern void abort (void);

void __attribute__ ((noinline))
bar (int m[3][3], int a[3][3], int b[3][3])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 3; j++)
      m[i][j] = a[i][0] * b[0][j] + a[i][1] * b[1][j] + a[i][2] * b[2][j];
}

static inline void __attribute__ ((always_inline))
foo (int x[][3][3], int g[3][3], int y, int z)
{
  int i, j, k;
  for (i = 0; i < y; i++)
    for (j = 0; j < z - 1; j++)
      {
        k = i * (z - 1) + j + y;
        bar (x[k], g, x[k - y]);
      }
}

int g1[48][3][3] = { { {1, 0, 0}, {0, 1, 0}, {0, 0, 1} } };
int g2[3][3] = { {-1, 0, 0}, {0, -1, 0}, {0, 0, -1} };
int g3[3][3] = { {0, 1, 0}, {1, 0, 0}, {0, 0, 1} };
int g4[3][3] = { {-1, 0, 0}, {0, 1, 0}, {0, 0, -1} };
int g5[3][3] = { {-1, 0, 0}, {0, -1, 0}, {0, 0, 1} };

int
main ()
{
  foo (g1, g2, 1, 2);
  foo (g1, g4, 2, 2);
  foo (g1, g5, 4, 2);
  foo (g1, g3, 8, 2);
  if (g1[1][1][0] != 0)
    abort ();

  return 0;
}
Comment 10 Jakub Jelinek 2008-04-23 16:53:59 UTC
And one with just one inlined fn:
extern void abort (void);

void __attribute__ ((noinline))
bar (int m[3][3], int a[3][3], int b[3][3])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 3; j++)
      m[i][j] = a[i][0] * b[0][j] + a[i][1] * b[1][j] + a[i][2] * b[2][j];
}

static inline void __attribute__ ((always_inline))
foo (int x[][3][3], int g[3][3], int y, int z)
{
  int i, j, k;
  for (i = 0; i < y; i++)
    for (j = 0; j < z - 1; j++)
      {
        k = i * (z - 1) + j + y;
        bar (x[k], g, x[k - y]);
      }
}

int g[48][3][3] = {
{ {1, 0, 0}, {0, 1, 0}, {0, 0, 1} }, { {-1, 0, 0}, {0, -1, 0}, {0, 0, -1} },
{ {-1, 0, 0}, {0, 1, 0}, {0, 0, -1} }, { {1, 0, 0},  {0, -1, 0}, {0, 0, 1} },
{ {-1, 0, 0}, {0, -1, 0}, {0, 0, 1} }, { {1, 0, 0}, {0, 1, 0}, {0, 0, -1} },
{ {1, 0, 0}, {0, -1, 0}, { 0, 0, -1} }, { {-1, 0, 0}, {0, 1, 0}, {0, 0, 1} } };
int h[3][3] = { {0, 1, 0}, {1, 0, 0}, {0, 0, 1} };

int
main ()
{
  foo (g, h, 8, 2);
  if (g[1][1][0] != 0)
    abort ();
  return 0;
}
Comment 11 Jakub Jelinek 2008-04-24 09:33:40 UTC
extern void abort (void);

int g[48][3][3];

void __attribute__ ((noinline))
bar (int x[3][3], int y[3][3])
{
  static int i;
  if (x != g[i + 8] || y != g[i++])
    abort ();
}

static inline void __attribute__ ((always_inline))
foo (int x[][3][3])
{
  int i;
  for (i = 0; i < 8; i++)
#ifdef GOOD
    bar (x[i + 8], x[i]);
#else
    {
      int k = i + 8;
      bar (x[k], x[k - 8]);
    }
#endif
}

int
main ()
{
  foo (g);
  return 0;
}

with -DGOOD doesn't fail at any optimization level, without it fails again with -O2 -funroll-loops, -O3 etc.
Comment 12 Jakub Jelinek 2008-04-24 09:59:45 UTC
This is actually a tree optimization issue.  In optimized dump without -DGOOD we have:
  bar (&g[0][0] + 288, &g[0][0]);
  bar (&g[0][0] + 324, &g[1][0]);
  bar (&g[0][0] + 360, &g[2][0]);
  bar (&g[0][0] + 396, &g[3][0]);
  bar (&g[1][0], &g[4][0]);
  bar (&g[0][0] + 468, &g[5][0]);
  bar (&g[0][0] + 504, &g[6][0]);
  bar (&g[0][0] + 540, &g[7][0]);
note the bogus first argument for 5th bar call, should have been &g[0][0] + 432
aka &g[12][0].
In *.reassoc2 we have for the 4th and 5th bar calls:
  i_73 = i_56 + 1;
  k_80 = i_73 + 8;
  D.1588_81 = (long unsigned int) k_80;
  D.1589_82 = D.1588_81 * 36;
  D.1590_83 = D.1589_82 + -288;
  D.1591_84 = &g + D.1590_83;
  D.1592_85 = &(*D.1591_84)[0];
  D.1594_86 = &g[0][0] + D.1589_82;
  bar (D.1594_86, D.1592_85);
  i_90 = i_73 + 1;
  k_97 = i_90 + 8;
  D.1588_98 = (long unsigned int) k_97;
  D.1589_99 = D.1588_98 * 36;
  D.1590_100 = D.1589_99 + -288;
  D.1591_101 = &g + D.1590_100;
  D.1592_102 = &(*D.1591_101)[0];
  D.1594_103 = &g[0][0] + D.1589_99;
  bar (D.1594_103, D.1592_102);
which looks correct, but in *.vrp2:
  i_73 = 3;
  k_80 = 11;
  D.1588_81 = 11;
  D.1589_82 = 396;
  D.1590_83 = 108;
  D.1591_84 = &g[3];
  D.1592_85 = &(*D.1591_84)[0];
  D.1594_86 = &g[0][0] + 396;
  bar (D.1594_86, D.1592_85);
  i_90 = 4;
  k_97 = 12;
  D.1588_98 = 12;
  D.1589_99 = 432;
  D.1590_100 = 144;
  D.1591_101 = &g[4];
  D.1592_102 = &(*D.1591_101)[0];
  D.1594_103 = &g[1][0];
  bar (&g[1][0], D.1592_102);
which is wrong.  So to me this looks like vrp bug.
Comment 13 Jakub Jelinek 2008-04-24 11:29:07 UTC
Created attachment 15524 [details]
gcc43-pr36008.patch

Fix I'm bootstrapping/regtesting ATM.
Comment 14 Jakub Jelinek 2008-04-24 16:08:56 UTC
Subject: Bug 36008

Author: jakub
Date: Thu Apr 24 16:08:11 2008
New Revision: 134634

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=134634
Log:
	PR tree-optimization/36008
	* fold-const.c (try_move_mult_to_index): If s == NULL, divide
	the original op1, rather than delta by step.

	* gcc.c-torture/execute/20080424-1.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/20080424-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog

Comment 15 Jakub Jelinek 2008-04-24 16:20:08 UTC
Subject: Bug 36008

Author: jakub
Date: Thu Apr 24 16:19:22 2008
New Revision: 134636

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=134636
Log:
	PR tree-optimization/36008
	* fold-const.c (try_move_mult_to_index): If s == NULL, divide
	the original op1, rather than delta by step.

	* gcc.c-torture/execute/20080424-1.c: New test.

Added:
    branches/gcc-4_3-branch/gcc/testsuite/gcc.c-torture/execute/20080424-1.c
Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/fold-const.c
    branches/gcc-4_3-branch/gcc/testsuite/ChangeLog

Comment 16 Jakub Jelinek 2008-04-24 16:32:50 UTC
Fixed.