[PATCH] X86: Add an option -muse-unaligned-vector-move

Wed Oct 20 16:58:16 GMT 2021

On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>On Wed, Oct 20, 2021 at 4:18 AM Richard Biener
><richard.guenther@gmail.com> wrote:
>>
>> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong <dianhong7@gmail.com> wrote:
>> >
>> > Many thanks for your explanation. I got the meaning of operands.
>> > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned otherwise it will rise a "Real-Address Mode Exceptions".
>> > I haven't considered this situation  "b(%rip)" has an address dependence of "a(%rip)" before. I think this situation could be resolved on the assembler side except for this dummy code like "movapd 0x200b37(%rip),%xmm1, ... addpd  0x200b37(%rip),%xmm0 ".
>>
>> Of course the compiler will only emit instructions which have the
>> constraint of aligned memory
>> when the memory is known to be aligned.  That's why I wonder why you
>> would need such
>> option.  "Real-Address Mode Exceptions" may point to the issue, but I
>> wonder what's different
>> in real mode vs. protected mode - even with segmentation the alignment
>> of objects should
>> prevail unless you play linker"tricks" that make global objects have
>> different alignment - but
>> then it's better to adjust the respective hooks to not falsely claim
>> such alignment.  Consider
>> for example
>>
>>    if ((uintptr_t)&a & 0x7)
>>      foo();
>>   else
>>      bar();
>>
>> GCC will optimize the branch statically to always call foo if 'a'
>> appears to be aligned,
>> even if you later try to "override" this with an option.  Alignment is
>> not only about
>> moves, it's also about knowledge about low bits in addresses and about
>> alias analysis where alignment constrains how two objects can overlap.
>>
>> So - do not lie to the compiler!  A late "workaround" avoiding aligned
>> SSE moves isn't a proper fix.
>>
>
>The motivations are
>
>1.  AVX non-load/store ops work on unaligned memory.   Unaligned
>load/store on aligned memory is as fast as aligned load/store on Intel
>AVX machines.   The new switch makes load/store consistent with
>other AVX ops.
>2. We don't properly align the stack for AVX on Windows.  This can
>be used as a workaround for -mavx on Windows.

But this, with lying that the stack is aligned, causes all of the above mentioned issues and thus needs to be fixed by either properly aligning the stack or not lying to the compiler that we do.

>
>We can change TARGET_USE_UNALIGNED_VECTOR_MOVE
>to require AVX.

But such workaround does not make any sense since it does not fix the fundamental underlying problem. 

Richard. 

>