[Bug tree-optimization/58039] -ftree-vectorizer makes a loop crash on a non-aligned memory

Mon Aug 12 10:32:00 GMT 2013

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58039

--- Comment #4 from Alexander Barkov <bar at mariadb dot org> ---
Mikael, thanks for  your comment on this.

(In reply to Mikael Pettersson from comment #3)
> Your code performs mis-aligned uint16_t stores, which x86 allows.

Right, this is done for performance purposes.

> The
> vectorizer turns those into larger and still mis-aligned `movdqa' stores,
> which x86 does not allow, hence the SEGV.

Can you please clarify: is it a bug in the recent gcc versions?

Note, we've used such performance improvement tricks for years.
It worked perfectly fine until now.
Has anything changed in how the gcc vectorizer works recently?

> 
> Replace the non-portable mis-aligned stores with portable code like
> 
> #define int2store_little_endian(s,A) memcpy((s), &(A), 2)
> 
> or gcc-specific code like
> 
> struct __attribute__((__packed__)) packed_uint16 {
>     uint16_t u16;
> };
> #define int2store_little_endian(s,A) ((struct packed_uint16*)(s))->u16 = (A)
> 
> and then the vectorizer generates large `movdqu' stores, which is pretty
> much the best you can hope for unless you rewrite the code to avoid
> mis-aligned stores.

Unfortunately it's not possible to avoid mis-aligned stores due to the
project architecture.

I've read somewhere that gcc vectorizer generates two code branches,
for aligned memory and for non-aligned memory (but can't find
the reference now). Can you please confirm this?

Thanks.