[Bug tree-optimization/65709] [5 Regression] Bad code for LZ4 decompression with -O3 on x86_64

noloader at gmail dot com gcc-bugzilla@gcc.gnu.org
Wed Aug 12 18:37:00 GMT 2015


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709

--- Comment #19 from Jeffrey Walton <noloader at gmail dot com> ---
(In reply to Yann Collet from comment #18)
> This issue makes me wonder : how to efficiently access unaligned memory ?
> 
> 
> The case in point is ARM cpu.
> They don't support SSE/AVX, so they seem unaffected by this specific issue,
> but this issue force writing the source code in a certain way, to remain
> compatible with vectorizer assumtion.

Just one comment here (sorry for speaking out of turn)....

Modern ARM has __ARM_FEATURE_UNALIGNED, which means the processor tolerates
unaligned access. However, I believe it runs afoul of the C/C++ standard and
GCC aliasing rules.

> Therefore, for portable code, it becomes an issue :
> how to write a code which is both portable and efficient on both targets ?

I've been relying on intrinsics to side step C/C++ requirements. In the absence
of intrinsics, I use inline assembler to avoid the C/C++ language rules.

Now, I could be totally wrong, but I don't feel I'm violating the C/C++
language rules until a write a C/C++ statement that performs the violation.
Hence the reason I use intrinsics or drop into assembler.



More information about the Gcc-bugs mailing list