Bug 37573 - [4.4 Regression] gcc-4.4 regression: incorrect code generation with -O1 -ftree-vectorize
Summary: [4.4 Regression] gcc-4.4 regression: incorrect code generation with -O1 -ftre...
Status: VERIFIED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.0
: P1 normal
Target Milestone: 4.4.0
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2008-09-18 17:30 UTC by Török Edwin
Modified: 2008-11-03 17:50 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.1.3 4.3.2
Known to fail: 4.1.2
Last reconfirmed: 2008-11-03 09:55:10


Attachments
config.log for gcc 4.4 (6.00 KB, text/plain)
2008-09-18 17:31 UTC, Török Edwin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Török Edwin 2008-09-18 17:30:18 UTC
Compiling the testcase below with -O2 works, and fails with -O3.
gcc-4.3 works with both -O2 and -O3.

$ x86_64-unknown-linux-gnu-gcc-4.4.0 -O2 testcase-min.i
$ ./a.out
OK
$ x86_64-unknown-linux-gnu-gcc-4.4.0 -O3 testcase-min.i
$ ./a.out
Failed: /.G����7��G��?�?�;
$ ./a.out
Failed: �G����7 h�G��3�?�

$ gcc-4.3 -O3 testcase-min.i
$ ./a.out
OK

Actually it is -ftree-vectorize that causes the failure:

$ x86_64-unknown-linux-gnu-gcc-4.4.0 -O1 -ftree-vectorize testcase-min.i
$ ./a.out
Failed:"�@���@ �?��

The output is random garbage with -O3, but should be '>AUTOIT UNICODE SCRIPT<'.

$ x86_64-unknown-linux-gnu-gcc-4.4.0 --version
x86_64-unknown-linux-gnu-gcc-4.4.0 (GCC) 4.4.0 20080918 (experimental)
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

gcc-4.4 is a build from svn:
$ svn info
Path: .
URL: svn://gcc.gnu.org/svn/gcc/trunk
Repository Root: svn://gcc.gnu.org/svn/gcc
Repository UUID: 138bc75d-0d04-0410-961f-82ee72b054a4
Revision: 140456
Node Kind: directory
Schedule: normal
Last Changed Author: amacleod
Last Changed Rev: 140456
Last Changed Date: 2008-09-18 17:07:35 +0300 (Thu, 18 Sep 2008)

$ gcc-4.3 --version
gcc-4.3 (Debian 4.3.2-1) 4.3.2
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Comment 1 Török Edwin 2008-09-18 17:30:42 UTC
Testcase:
extern int printf (__const char *__restrict __format, ...);
extern int strcmp (__const char *__s1, __const char *__s2);
extern int puts (__const char *__s);
typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
struct MT
{
  uint32_t *next;
  uint32_t items;
  uint32_t mt[624];
};
static uint8_t
MT_getnext (struct MT *MT)
{
  uint32_t r;
  if (!--MT->items)
    {
      uint32_t *mt = MT->mt;
      unsigned int i;
      MT->next = mt;
      for (i = 0; i < 227; i++)
        mt[i] =
         ((((mt[i] ^ mt[i + 1]) & 0x7ffffffe) ^ mt[i]) >> 1) ^
           ((0 - (mt[i + 1] & 1)) & 0x9908b0df) ^ mt[i + 397];
    }
  r = *(MT->next++);
  r ^= (r >> 11);
  r ^= ((r & 0xff3a58ad) << 7);
  r ^= ((r & 0xffffdf8c) << 15);
  r ^= (r >> 18);
  return (uint8_t) (r >> 1);
}

static void
MT_decrypt (uint8_t * buf, unsigned int size, uint32_t seed)
{
  struct MT MT;
  unsigned int i;
  uint32_t *mt = MT.mt;
  *mt = seed;
  for (i = 1; i < 624; i++)
    mt[i] = i + 0x6c078965 * ((mt[i - 1] >> 30) ^ mt[i - 1]);
  MT.items = 1;
  while (size--)
    *buf++ ^= MT_getnext (&MT);
};

static uint8_t buf[23] = {
  0xc0, 0x49, 0x17, 0x32, 0x62, 0x1e, 0x2e, 0xd5, 0x4c, 0x19, 0x28, 0x49,
    0x91, 0xe4, 0x72, 0x83, 0x91, 0x3d, 0x93, 0x83, 0xb3, 0x61, 0x38
};

int
main (void)
{
  uint32_t s;
  s = 23;
  MT_decrypt (buf, s, s + 0xa25e);
  if (!strcmp (">AUTOIT UNICODE SCRIPT<", buf))
    {
      puts ("OK");
      return 0;
    }
  else
    {
      printf ("Failed:%s\n", buf);
      return 1;
    }
}
Comment 2 Török Edwin 2008-09-18 17:31:50 UTC
Created attachment 16357 [details]
config.log for gcc 4.4

Some more system info:
$ uname -a
Linux debian 2.6.26-1-amd64 #1 SMP Wed Sep 10 15:31:12 UTC 2008 x86_64 GNU/Linux
$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz
stepping        : 7
cpu MHz         : 2000.000
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5669.97
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz
stepping        : 7
cpu MHz         : 2000.000
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5666.05
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz
stepping        : 7
cpu MHz         : 2000.000
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5666.03
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz
stepping        : 7
cpu MHz         : 2000.000
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5666.03
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
Comment 3 Andrew Pinski 2008-09-18 19:14:32 UTC
Also fails on powerpc-linux-gnu.
Comment 4 Jakub Jelinek 2008-09-19 17:02:19 UTC
Self-contained testcase, which will work even for non-ascii compatible exec-charset:

/* PR tree-optimization/37573 */

struct S
{
  unsigned int *a;
  unsigned int b;
  unsigned int c[624];
};

static unsigned char __attribute__((noinline))
foo (struct S *s)
{
  unsigned int r;
  if (!--s->b)
    {
      unsigned int *c = s->c;
      unsigned int i;
      s->a = c;
      for (i = 0; i < 227; i++)
c[i]
  = ((((c[i] ^ c[i + 1]) & 0x7ffffffe) ^ c[i]) >> 1)
    ^ ((0 - (c[i + 1] & 1)) & 0x9908b0df) ^ c[i + 397];
    }
  r = *(s->a++);
  r ^= (r >> 11);
  r ^= ((r & 0xff3a58ad) << 7);
  r ^= ((r & 0xffffdf8c) << 15);
  r ^= (r >> 18);
  return (unsigned char) (r >> 1);
}

static void __attribute__((noinline))
bar (unsigned char *p, unsigned int q, unsigned int r)
{
  struct S s;
  unsigned int i;
  unsigned int *c = s.c;
  *c = r;
  for (i = 1; i < 624; i++)
    c[i] = i + 0x6c078965 * ((c[i - 1] >> 30) ^ c[i - 1]);
  s.b = 1;
  while (q--)
    *p++ ^= foo (&s);
};

static unsigned char p[23] = {
  0xc0, 0x49, 0x17, 0x32, 0x62, 0x1e, 0x2e, 0xd5, 0x4c, 0x19, 0x28, 0x49,
  0x91, 0xe4, 0x72, 0x83, 0x91, 0x3d, 0x93, 0x83, 0xb3, 0x61, 0x38
};

static unsigned char q[23] = {
  0x3e, 0x41, 0x55, 0x54, 0x4f, 0x49, 0x54, 0x20, 0x55, 0x4e, 0x49, 0x43,
  0x4f, 0x44, 0x45, 0x20, 0x53, 0x43, 0x52, 0x49, 0x50, 0x54, 0x3c
};

int
main (void)
{
  unsigned int s;
  s = 23;
  bar (p, s, s + 0xa25e);
  if (__builtin_memcmp (p, q, s) != 0)
    __builtin_abort ();
  return 0;
}

The bug is in the bar function, as can be proven by compiling it with -O3 and adding optimize (0) attribute to bar.
Comment 5 Jakub Jelinek 2008-09-19 17:52:31 UTC
The data dependence on the previous loop is clearly not considered, the loop is vectorized as if c on the rhs and c on the lhs were different non-overlapping arrays.
Comment 6 Ira Rosen 2008-09-21 07:54:39 UTC
(In reply to comment #5)
> The data dependence on the previous loop is clearly not considered, the loop is
> vectorized as if c on the rhs and c on the lhs were different non-overlapping
> arrays.
>

The data dependence analysis returns chrec_known for that ddr (probably getting confused by *&s.c[0] and *&s.c[1] base objects), so the vectorizer thinks they are independent and vectorizes the loop.

*D.1659_13 
        base_address: &s.c[0]
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *&s.c[0]
        symbol tag: SMT.62

*D.1655_9
        base_address: &s.c[1]
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *&s.c[1]
        symbol tag: SMT.62
(compute_affine_dependence
  (stmt_a =
D.1660_14 = *D.1659_13;
)
  (stmt_b =
*D.1655_9 = D.1664_23;
)
)
Comment 7 Sebastian Pop 2008-10-15 21:28:48 UTC
split_constant_offset does not handle correctly the offset of
&s.c[1] as this is an ADDR_EXPR whose op0 was set by 
extract_ops_from_tree to itself, an ADDR_EXPR.  Now this code
is not handled in split_constant_offset_1 as this calls

	if (!handled_component_p (op0))
	  return false;

and returns false.  This is an early return that stops the extraction
of the base address of the array: s.c[0] and the offset 4.

This is probably due to the fact that extract_ops_from_tree does
return the expression itself for ADDR_EXPRs in this case:

  else if (grhs_class == GIMPLE_SINGLE_RHS)
    {
      *op1_p = expr;
      *op2_p = NULL_TREE;
    }

I wonder if the fix could be to handle ADDR_EXPRs as GIMPLE_UNARY_RHS
instead of GIMPLE_SINGLE_RHS:

  else if (grhs_class == GIMPLE_UNARY_RHS)
    {
      *op1_p = TREE_OPERAND (expr, 0);
      *op2_p = NULL_TREE;
    }

in which case that would solve the base and offset computations.
Comment 8 Richard Biener 2008-10-15 21:45:06 UTC
No, ADDR_EXPRs are single because they can have an arbitrary number of operands
(think of &a_1->b[i_2][j_3] which has three operands, a_1, i_2 and j_3).  In
your case it is a is_gimple_min_invariant, which may add to the confusion?
Comment 9 Richard Biener 2008-10-15 21:47:45 UTC
IMHO the fix for the tuplification bug(!) is to strip the ADDR_EXPR that is
always present on op0 in split_constant_offset_1 so:

    case ADDR_EXPR:
      {
        tree base, poffset;
        HOST_WIDE_INT pbitsize, pbitpos;
        enum machine_mode pmode;
        int punsignedp, pvolatilep;

        op0 = TREE_OPERAND (op0, 0);
        if (!handled_component_p (op0))
          return false;
Comment 10 Sebastian Pop 2008-10-16 00:02:07 UTC
Subject: Re:  [4.4 Regression] gcc-4.4 regression: incorrect code generation with -O1 -ftree-vectorize

On Wed, Oct 15, 2008 at 4:47 PM, rguenth at gcc dot gnu dot org
> IMHO the fix for the tuplification bug(!) is to strip the ADDR_EXPR that is
> always present on op0 in split_constant_offset_1 so:
>
>    case ADDR_EXPR:
>      {
>        tree base, poffset;
>        HOST_WIDE_INT pbitsize, pbitpos;
>        enum machine_mode pmode;
>        int punsignedp, pvolatilep;
>
>        op0 = TREE_OPERAND (op0, 0);
>        if (!handled_component_p (op0))
>          return false;

This is exactly what I tried within gdb and it did worked, although
not as I expected: the base address ends to be &s and not &s.c[0] as
I expected before.  This then fixes the bug as we end on the same
base address of the structure.  Then the alias analysis answers that
the two accesses can overlap.
Comment 11 rguenther@suse.de 2008-10-16 08:14:50 UTC
Subject: Re:  [4.4 Regression] gcc-4.4 regression:
 incorrect code generation with -O1 -ftree-vectorize

On Thu, 16 Oct 2008, spop at gcc dot gnu dot org wrote:

> ------- Comment #10 from spop at gcc dot gnu dot org  2008-10-16 00:02 -------
> Subject: Re:  [4.4 Regression] gcc-4.4 regression: incorrect code generation
> with -O1 -ftree-vectorize
> 
> On Wed, Oct 15, 2008 at 4:47 PM, rguenth at gcc dot gnu dot org
> > IMHO the fix for the tuplification bug(!) is to strip the ADDR_EXPR that is
> > always present on op0 in split_constant_offset_1 so:
> >
> >    case ADDR_EXPR:
> >      {
> >        tree base, poffset;
> >        HOST_WIDE_INT pbitsize, pbitpos;
> >        enum machine_mode pmode;
> >        int punsignedp, pvolatilep;
> >
> >        op0 = TREE_OPERAND (op0, 0);
> >        if (!handled_component_p (op0))
> >          return false;
> 
> This is exactly what I tried within gdb and it did worked, although
> not as I expected: the base address ends to be &s and not &s.c[0] as
> I expected before.  This then fixes the bug as we end on the same
> base address of the structure.  Then the alias analysis answers that
> the two accesses can overlap.

Yes, but that is another thing.  It also isn't that simple - for any
two accesses you want to disambiguate you have to find a suitable
common base.  Consider &s.c[1] and &s + i, obviously the accesses can
overlap - would you still say so if the base address of the first one
would be &s.c[0]?  (really the base address of a non-variable access
is the access itself, right?  &s.c[1] in this case)

Richard.
Comment 12 sebpop@gmail.com 2008-10-22 16:10:27 UTC
Subject: Re:  [4.4 Regression] gcc-4.4 regression: incorrect code generation with -O1 -ftree-vectorize

> common base.  Consider &s.c[1] and &s + i, obviously the accesses can
> overlap - would you still say so if the base address of the first one
> would be &s.c[0]?

Yes, in the case &s.c[1] versus &s.c[0], we still have to consider the
arrays to potentially overlap.

> (really the base address of a non-variable access is the access
> itself, right?  &s.c[1] in this case)

No, it cannot be &s.c[1] here.  The base object for arrays in structs
should be the struct itself.

The base address tells you what memory object is accessed with an
offset.  For structs, you are allowed to access any of their contents
using arithmetic.  For instance in:

struct s {
  int a[2];
  int c[20];
}

you could access s.c[10] from the address of struct s with: &s.a + 12.
Comment 13 Török Edwin 2008-10-29 18:48:56 UTC
I just noticed that this testcase also fails with -O3 on gcc version 4.1.2 20070626 (Red Hat 4.1.2-14), but works on gcc version 4.1.3 20080623 (prerelease) (Debian 4.1.2-23)
Comment 14 Richard Biener 2008-11-03 09:55:09 UTC
Mine.
Comment 15 Richard Biener 2008-11-03 12:33:02 UTC
Fixed.
Comment 16 Richard Biener 2008-11-03 12:34:20 UTC
Subject: Bug 37573

Author: rguenth
Date: Mon Nov  3 12:32:52 2008
New Revision: 141547

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141547
Log:
2008-11-03  Richard Guenther  <rguenther@suse.de>

	PR middle-end/37573
	* tree-data-ref.c (split_constant_offset_1): Fix tuplification.

	* gcc.c-torture/execute/pr37573.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr37573.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-data-ref.c

Comment 17 Török Edwin 2008-11-03 17:50:49 UTC
Thanks.