Bug 35653 - [4.3/4.4 Regression]: gcc-4.3 -O3/-ftree-vectorize regression: incorrect code generation
Summary: [4.3/4.4 Regression]: gcc-4.3 -O3/-ftree-vectorize regression: incorrect code...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: victork
URL:
Keywords: wrong-code
Depends on:
Blocks: 32893
  Show dependency treegraph
 
Reported: 2008-03-21 12:41 UTC by Török Edwin
Modified: 2008-03-26 22:56 UTC (History)
5 users (show)

See Also:
Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
Build: x86_64-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2008-03-23 13:33:08


Attachments
reduced testcase (277 bytes, text/plain)
2008-03-21 12:42 UTC, Török Edwin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Török Edwin 2008-03-21 12:41:37 UTC
On x86-64 when clamav is compiled with gcc-4.3 -O3, it crashes in mew.c.

Compiling a reduced testcase shows -ftree-vectorize introduces the bug.

This happens only on gcc-4.3, gcc-4.2 and gcc-4.1 work ok with -O3/-ftree-vectorize.

$ gcc-4.3 -O3 test.i -o fails && ./fails
Segmentation fault
$ gcc-4.3 -O1 -ftree-vectorize test.i -o fails2 && ./fails2
Segmentation fault
$ gcc-4.3 -O2 test.i -o works && ./works
$ gcc-4.2 -O3 test.i -o works2 && ./works2
$ gcc-4.2 -O3 -ftree-vectorize test.i -o works3 && ./works3

To reproduce on 32-bit x86 use:
$ gcc-4.3 -O1 -ftree-vectorize -msse -msse2 test.i -o fails3 && ./fails3
Segmentation fault

$ gcc-4.3 -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure linux gnu
Thread model: posix
gcc version 4.3.1 20080309 (prerelease) (Debian 4.3.0-1)
$ uname -a
Linux lightspeed2 2.6.25-rc4-00134-g84c6f60 #4 Sun Mar 9 19:40:34 EET 2008 x86_64 GNU/Linux
Comment 1 Török Edwin 2008-03-21 12:42:53 UTC
Created attachment 15353 [details]
reduced testcase

$ gcc-4.3 -O1 -g -ftree-vectorize test.i -o  fails
$ gdb ./fails
GNU gdb 6.7.1-debian
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) r
Starting program: /home/edwin/clam/svn3/builds/gcc43opt/fortify-asserts-exp/tmpd/fails

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400532 in main () at test.i:14
14                              (*(uint32_t *)(dest)=(uint32_t)(0x4000400));
(gdb) p dest
$1 = 0x7fffaae00672 ""
Comment 2 Richard Biener 2008-03-21 14:56:42 UTC
You probably fool the vectorizers alignment analysis.
Comment 3 H.J. Lu 2008-03-21 17:47:58 UTC
It is caused by revision 129764:

http://gcc.gnu.org/ml/gcc-cvs/2007-10/msg00870.html

Comment 4 H.J. Lu 2008-03-21 17:53:03 UTC
I think revision 129764 exposed a latent vectorizer bug. Before revision
129764, the loop wasn't vectorized.
Comment 5 H.J. Lu 2008-03-21 18:56:14 UTC
I don't think peeling in vect_enhance_data_refs_alignment checks
if there is only one data reference.
Comment 6 victork 2008-03-23 13:33:08 UTC
Here is AN even more reduced example which demonstrates the problem:
int main()
{
  char buf[256];
  char *dest;
  int i;

  dest = &buf[2];
  for (i = 0; i < 32; i++)
  {
    *(unsigned *)dest = 0;
    dest += 4;
  }

  return buf[2];
}

gcc -O3 t.c  && ./a.out
Segmentation fault


The problem is that vectorizer is assuming that the access was aligned to the the size of the element before vectorization. A loop peeling only handles misalignment in multiples of element size. 
In this case, vectorizer could check the actual alignment in compile time and prevent vectorization.  (Though the assignment of a constant still can be vectorized if value of constant is adjusted accordingly).
In general case, when alignment of the non-vectorized access is unknown, a fix of this bug would require an addition of a run-time test.
Comment 7 pinskia@gmail.com 2008-03-23 16:50:47 UTC
Subject: Re:  [4.3/4.4 Regression]: gcc-4.3 -O3/-ftree-vectorize regression: incorrect code generation

This code violates c/c++ aliasing rules.

Sent from my iPhone

On Mar 23, 2008, at 6:33, "victork at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #6 from victork at gcc dot gnu dot org  2008-03-23  
> 13:33 -------
> Here is AN even more reduced example which demonstrates the problem:
> int main()
> {
>  char buf[256];
>  char *dest;
>  int i;
>
>  dest = &buf[2];
>  for (i = 0; i < 32; i++)
>  {
>    *(unsigned *)dest = 0;
>    dest += 4;
>  }
>
>  return buf[2];
> }
>
> gcc -O3 t.c  && ./a.out
> Segmentation fault
>
>
> The problem is that vectorizer is assuming that the access was  
> aligned to the
> the size of the element before vectorization. A loop peeling only  
> handles
> misalignment in multiples of element size.
> In this case, vectorizer could check the actual alignment in compile  
> time and
> prevent vectorization.  (Though the assignment of a constant still  
> can be
> vectorized if value of constant is adjusted accordingly).
> In general case, when alignment of the non-vectorized access is  
> unknown, a fix
> of this bug would require an addition of a run-time test.
>
>
> -- 
>
> victork at gcc dot gnu dot org changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>         AssignedTo|unassigned at gcc dot gnu   |victork at gcc dot  
> gnu dot
>                   |dot org                     |org
>             Status|NEW                         |ASSIGNED
>   Last reconfirmed|2008-03-21 17:47:58         |2008-03-23 13:33:08
>               date|                            |
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35653
>
Comment 8 Török Edwin 2008-03-23 16:59:55 UTC
(In reply to comment #7)
> Subject: Re:  [4.3/4.4 Regression]: gcc-4.3 -O3/-ftree-vectorize regression:
> incorrect code generation
> 
> This code violates c/c++ aliasing rules.
> 

-Wstrict-aliasing doesn't give a warning about it.

Accessing the memory as uint32_t is done for performance reasons, the alternative would be to use byte-sized accesses. When reading, it would involve additional shift/or instructions.

If the pointer is not aligned at 4 bytes, then uint32_t can't be done?

If that is the case, a warning would be *very* useful.
Comment 9 victork 2008-03-23 17:21:16 UTC
> Accessing the memory as uint32_t is done for performance reasons,

In this particular case you write a 4-byte constant 0x04000400. Changing this to
a loop running double number of iterations of 2-byte (unsigned short) writes of 0x0400 will enable to vectorizer to handle this loop effectively.

> If the pointer is not aligned at 4 bytes, then uint32_t can't be done?

It can be done on x86, but it is less effective than access to an aligned pointer.
Comment 10 victork 2008-03-23 17:26:14 UTC
> This code violates c/c++ aliasing rules.
Probably you are right, but,
1. -fno-strict-aliasing doesn't help
2. -Wstrict-aliasing -Wstrict-aliasing=2 doesn't give any warning.
3. I don't think that re-writing this code using unions will help.
Comment 11 Andrew Pinski 2008-03-24 16:34:15 UTC
I think the code is violating alignment requirements of the C/C++ standard.
Comment 12 Richard Biener 2008-03-24 16:48:07 UTC
I tend to agree with Andrew here.  If you go through a packed union the failure
vanishes:

union U  {
  unsigned u;
}__attribute__((packed));
int main()
{
  char buf[256];
  char *dest;
  int i;

  dest = &buf[2];
  for (i = 0; i < 32; i++)
  {
    ((union U *)dest)->u = 0;
    dest += 4;
  }

  return buf[2];
}

So, IMHO this bug is invalid.
Comment 13 Török Edwin 2008-03-24 17:00:45 UTC
(In reply to comment #11)
> I think the code is violating alignment requirements of the C/C++ standard.
> 

First a real bug here is that -Wstrict-aliasing doesn't warn of this situation. Do you agree?

Unaligned accesses is undefined in general, I agree.

But on x86 it has always been possible to do unaligned accesses, and it is possible even with vector instructions (movdqu). Of course these are slower than aligned accesses, but there should be some way to express them in C [*]
The most natural way to do that is to typecast, and this has worked, knowing that x86 doesn't have instructions that can SIGBUS/SIGSEGV on unaligned accesses (not true anymore with vector instructions), or knowing that compilers won't generate vectorized code for unaligned accesses (true until gcc-4.3 AFAIK).

If you decide to handle unaligned accesses as undefined even for x86, the Linux kernel developers should be notified:

The kernel has a macro, that is defined as such on x86:
#define get_unaligned(ptr) (*(ptr))

And defined using these generically for non-x86 architectures
struct __una_u32 { __u32 x __attribute__((packed)); };
static inline __u32 __uldl(const __u32 *addr)
{
	const struct __una_u32 *ptr = (const struct __una_u32 *) addr;
	return ptr->x;
}
[*]: this can be used to express unaligned accesses safely

And it is being used in a loop: http://lxr.linux.no/linux/net/bluetooth/bnep/core.c#L153

BTW, the original code for this bugreport does unaligned accesses only on little-endian architectures, and this has worked on all compilers until now:

#if WORDS_BIGENDIAN == 0
..
#define cli_readint32(buff) (*(const int32_t *)(buff))
...
#else
...
static inline int32_t cli_readint32(const char *buff)
{
	int32_t ret;
    ret = buff[0] & 0xff;
    ret |= (buff[1] & 0xff) << 8;
    ret |= (buff[2] & 0xff) << 16;
    ret |= (buff[3] & 0xff) << 24;
    return ret;
}
...
#endif
Comment 14 victork 2008-03-24 19:14:01 UTC
> I tend to agree with Andrew here.  If you go through a packed union the failure
> vanishes:
> So, IMHO this bug is invalid.

The fact that failure vanishes is not a good argument here. The failure vanishes simply because vectorized failed to vectorize "dest->u" statement.
Here is an modified example
struct U
{
  unsigned u;
};

int main()
{
  struct U buf[256];
  struct U *dest;
  int i;

  dest = buf;
  for (i = 0; i < 32; i++)
  {
    dest->u = 0;
    dest += 1;
  }

  return buf[2].u;
}
And it is not vectorized without any good reason.
Comment 15 victork 2008-03-25 10:47:52 UTC
Just to make clear my previous comment.  I think that it *is* possible that original code violates c/c++ aliasing rules, but the example given by Richard Guenther is not-vectorized since currently we do not vectorize structures with a single member. We may want to open a missed-optimization bug report, but this problem is not related to the possible bug in the subject.
  Actually, the possible problem rises only when we try enforce alignment using loop peeling. If we enforce alignment using loop versioning, a run-time check will discover misalignment and a non-vectorized version of loop will be executed.
Comment 16 Richard Biener 2008-03-25 12:28:24 UTC
The C/C++ standards say that if you access memory through int, the data has
to be aligned suitably according to what the ABI specifies for alignment of int.
If you mis-align the access on purpose you have to tell the compiler, one
means of doing so is by using the aligned attribute or by using a packed
structure.  The vectorizer will then see the misalignment if it (hopefully)
uses get_pointer_alignment (or TYPE_ALIGN or whatever is suitable).

It is not expected that the vectorizer can deal with the situation in the
original report.

For this reason I think this bugreport is invalid.

  char c[4];
  *(int *)&c[0]

the access invokes undefined behavior at runtime if you do not make sure
c is properly aligned.  On strict alignment targets this will fault
regardless of vectorization or not.
Comment 17 davids 2008-03-26 22:56:11 UTC
I wonder why -Wcast-align doesn't generate a warning in this case. It would seem that would be what it's for. I get a segfault in the reduced test case and no warning from -Wcast-align. The documentation seems to specifically suggest that this is the case where a warning is generated.