Bug 58039 - -ftree-vectorizer makes a loop crash on a non-aligned memory
Summary: -ftree-vectorizer makes a loop crash on a non-aligned memory
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.2
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-01 04:52 UTC by Alexander Barkov
Modified: 2016-08-14 17:18 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
The program that reproduces the crash (452 bytes, text/x-csrc)
2013-08-01 04:52 UTC, Alexander Barkov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Barkov 2013-08-01 04:52:51 UTC
Created attachment 30578 [details]
The program that reproduces the crash

If I compile the attached program using:

gcc -Wall -O2 -fno-inline -ftree-vectorize -ftree-vectorizer-verbose=2 a.c

it crashes with "segmentation fault".


$ gcc --version
gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)

Processor: Intel® Core™ i7-3520M CPU @ 2.90GHz × 4


The program is a minimal extract from the MariaDB-10.0 sources
that reproduces the crash.

The GCC flags that are actually used in the debug build of MariaDB are:
gcc -Wall -O3 -fno-inline a.c

but after tracking it down we noticed that the actually reason is
-ftree-vectorize.
Comment 1 Alexander Barkov 2013-08-01 05:18:30 UTC
The bug is known to repeat on the following operating systems:

- Fedora 17
- Ubuntu 13.04
- OpenSUSE 11.1
Comment 2 Alexander Barkov 2013-08-06 09:55:14 UTC
Any updates? Thanks.
Comment 3 Mikael Pettersson 2013-08-07 09:14:31 UTC
Your code performs mis-aligned uint16_t stores, which x86 allows.  The vectorizer turns those into larger and still mis-aligned `movdqa' stores, which x86 does not allow, hence the SEGV.

Replace the non-portable mis-aligned stores with portable code like

#define int2store_little_endian(s,A) memcpy((s), &(A), 2)

or gcc-specific code like

struct __attribute__((__packed__)) packed_uint16 {
    uint16_t u16;
};
#define int2store_little_endian(s,A) ((struct packed_uint16*)(s))->u16 = (A)

and then the vectorizer generates large `movdqu' stores, which is pretty much the best you can hope for unless you rewrite the code to avoid mis-aligned stores.
Comment 4 Alexander Barkov 2013-08-12 10:32:16 UTC
Mikael, thanks for  your comment on this.

(In reply to Mikael Pettersson from comment #3)
> Your code performs mis-aligned uint16_t stores, which x86 allows.
 
Right, this is done for performance purposes.


> The
> vectorizer turns those into larger and still mis-aligned `movdqa' stores,
> which x86 does not allow, hence the SEGV.

Can you please clarify: is it a bug in the recent gcc versions?

Note, we've used such performance improvement tricks for years.
It worked perfectly fine until now.
Has anything changed in how the gcc vectorizer works recently?


> 
> Replace the non-portable mis-aligned stores with portable code like
> 
> #define int2store_little_endian(s,A) memcpy((s), &(A), 2)
> 
> or gcc-specific code like
> 
> struct __attribute__((__packed__)) packed_uint16 {
>     uint16_t u16;
> };
> #define int2store_little_endian(s,A) ((struct packed_uint16*)(s))->u16 = (A)
> 
> and then the vectorizer generates large `movdqu' stores, which is pretty
> much the best you can hope for unless you rewrite the code to avoid
> mis-aligned stores.


Unfortunately it's not possible to avoid mis-aligned stores due to the
project architecture.


I've read somewhere that gcc vectorizer generates two code branches,
for aligned memory and for non-aligned memory (but can't find
the reference now). Can you please confirm this?

Thanks.
Comment 5 Mikael Pettersson 2013-08-12 12:18:47 UTC
(In reply to Alexander Barkov from comment #4)
> > The
> > vectorizer turns those into larger and still mis-aligned `movdqa' stores,
> > which x86 does not allow, hence the SEGV.
> 
> Can you please clarify: is it a bug in the recent gcc versions?
> 
> Note, we've used such performance improvement tricks for years.
> It worked perfectly fine until now.
> Has anything changed in how the gcc vectorizer works recently?

I know next to nothing about the vectorizer, so I cannot comment on this.

> Unfortunately it's not possible to avoid mis-aligned stores due to the
> project architecture.

Mis-aligned accesses are Ok, as long as they are expressed using the proper mechanisms (memcpy, attribute packed, or pragma packed).

> I've read somewhere that gcc vectorizer generates two code branches,
> for aligned memory and for non-aligned memory (but can't find
> the reference now). Can you please confirm this?

I don't know, see above.
Comment 6 Andrew Pinski 2016-08-14 17:18:06 UTC
Not a bug, Use -fsanitizer=undefined to find them in a recent version of GCC.