This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/81389] _mm_cmpestri segfault on -O0
- From: "glisse at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 17 Aug 2017 10:32:55 +0000
- Subject: [Bug c/81389] _mm_cmpestri segfault on -O0
- Auto-submitted: auto-generated
- References: <bug-81389-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81389
--- Comment #13 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to rockeet from comment #7)
> @Marc @Jakub @Martin
> Intel CPU document says: operand of _mm_cmpestri can be memory or mm
> register, when the operand is memory, it does not require alignment.
That's the doc for the CPU instruction. The intrinsic, as a C function, always
takes an object of type __m128i, not a register or memory. The only question is
what the alignment of the type __m128i is. In gcc, it is 16 bytes. What does
alignof (or _Alignof or whatever variant you can get working) return with
Intel's compiler?
> The issue is: GCC does not know this knowledge(memory operand need not
> memory align), and there is no way to enforce gcc to generate a _mm_cmpestri
> which always use memory operand, not mm register.
Use inline asm? Intrinsics are not quite as low level as you seem to expect.
> If I manually load the unaligned memory into an aligned `__m128i`, it has
> performance penalty on optimizing compilation.
Uh? With -O1, the compiler merges the unaligned load with pcmpestri (it knows
that the insn can read unaligned memory). Did you mean to talk about the
performance of code generated with -O0? We explicitly do not care about that.