Is UBSan supposed to produce a finding for _mm_load_sd and _mm_store_sd

Fri Dec 8 17:29:00 GMT 2017

On Fri, 8 Dec 2017, Jeffrey Walton wrote:

> I have some code that loads a 64-bit integer into a XMM register. It
> loads the integer from a byte array:
>
>    byte v[8] = ...
>    __m128i t = _mm_castpd_si128(
>        _mm_load_sd((const double *)(v)));
>
> It is producing a finding for an unaligned load. I get similar
> findings for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd.
>
> According to the Intel Intrinsics Guide (e.g., _mm_load_sd):
>
>    Load a double-precision (64-bit) floating-point element from memory
>    into the lower of dst, and zero the upper element. mem_addr does
>    not need to be aligned on any particular boundary.
>
> Should GCC be producing a finding in this case? Is there a way to work
> around it without an extra memcpy?

The way _mm_load_sd is currently implemented in gcc, yes, sanitizers are 
right to complain. Intel could have named the thing _mm_loadu_sd if that's 
what they meant. It would be simple to change if we decide to do so, 
please file a PR in bugzilla.

Workaround: define a typedef for double with 
__attribute__((__aligned__(1))), and use _mm_set_sd(*(newtype*)p), that's 
how it will likely be done if we change emmintrin.h.

-- 
Marc Glisse