This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: code-gen options for disabling multi-operand AArch64 and ARM instructions
- From: Ard Biesheuvel <ard dot biesheuvel at linaro dot org>
- To: Laszlo Ersek <lersek at redhat dot com>
- Cc: gcc at gcc dot gnu dot org, Kevin Fenzi <kevin at scrye dot com>, Peter Robinson <perobins at redhat dot com>, Florian Weimer <fweimer at redhat dot com>, Fabiano Fidencio <ffidenci at redhat dot com>, "Daniel P. Berrange" <berrange at redhat dot com>, Drew Jones <drjones at redhat dot com>, Jeremy Linton <jeremy dot linton at arm dot com>, Paul Whalen <pwhalen at redhat dot com>, Jared Smith <jsmith dot fedora at gmail dot com>, Samuel Sieb <samuel-rhbugs at sieb dot net>, "Dominik 'Rathann' Mierzejewski" <dominik at greysector dot net>, Peter Robinson <pbrobinson at gmail dot com>, Jonathan Masters <jcm at redhat dot com>, Marc Zyngier <marc dot zyngier at arm dot com>, Christoffer Dall <christoffer dot dall at arm dot com>
- Date: Tue, 5 Jun 2018 10:18:24 +0200
- Subject: Re: code-gen options for disabling multi-operand AArch64 and ARM instructions
- References: <5102cfdf-1dc7-e257-77f3-eb8335eae4b8@redhat.com> <CAKv+Gu_k+BZ74MQ8cU55zO4hs6810OFiw17v5CGR2AEZdXETsA@mail.gmail.com> <7ea27d37-00da-1a85-e69a-d8955dd33487@redhat.com>
On 5 June 2018 at 10:16, Laszlo Ersek <lersek@redhat.com> wrote:
> On 06/05/18 08:04, Ard Biesheuvel wrote:
>> On 4 June 2018 at 20:10, Laszlo Ersek <lersek@redhat.com> wrote:
>>> Hi!
>>>
>>> Apologies if this isn't the right place for asking. For the problem
>>> statement, I'll simply steal Ard's writeup [1]:
>>>
>>>> KVM on ARM refuses to decode load/store instructions used to perform
>>>> I/O to emulated devices, and instead relies on the exception syndrome
>>>> information to describe the operand register, access size, etc. This
>>>> is only possible for instructions that have a single input/output
>>>> register (as opposed to ones that increment the offset register, or
>>>> load/store pair instructions, etc). Otherwise, QEMU crashes with the
>>>> following error
>>>>
>>>> error: kvm run failed Function not implemented
>>>> [...]
>>>> QEMU: Terminated
>>>>
>>>> and KVM produces a warning such as the following in the kernel log
>>>>
>>>> kvm [17646]: load/store instruction decoding not implemented
>>>>
>>>> GCC with LTO enabled will emit such instructions for Mmio[Read|Write]
>>>> invocations performed in a loop, so we need to disable LTO [...]
>>>
>>> We have a Red Hat Bugzilla about the (very likely) same issue [2].
>>>
>>> Earlier, we had to work around the same on AArch64 too [3].
>>>
>>> Would it be possible to introduce a dedicated -mXXX option, for ARM and
>>> AArch64, that disabled the generation of such multi-operand
>>> instructions?
>>>
>>> I note there are several similar instructions (for other architectures):
>>> * -mno-multiple (ppc)
>>> * -mno-fused-madd (ia64)
>>> * -mno-mmx and a lot of friends (x86)
>>>
>>> Obviously, if the feature request is deemed justified, we should provide
>>> the exact family of instructions to disable. I'll leave that to others
>>> on the CC list with more ARM/AArch64 expertise; I just wanted to get
>>> this thread started. (Sorry if the option is already being requested
>>> elsewhere; I admit I didn't search the GCC bugzilla.)
>>>
>>
>> I am not convinced that tweaking GCC code generation is the correct
>> approach here, to be honest.
>>
>> The issue only occurs when load/store instructions trap into KVM,
>> which (correct me if I am wrong) mostly only occurs when emulating
>> MMIO. The case I have been looking into (UEFI) uses MMIO accessors
>> correctly, but due to the way they are implemented (in C), LTO code
>> generation may result in load/store instructions with multiple outputs
>> to be used.
>>
>> So first of all, I would like to understand the magnitude of the
>> problem. If all cases we can identify involve performing MMIO using C
>> memory references, I think we should fix the code rather than the
>> compiler.
>
> To my understanding, Daniel has the opposite preference; namely, the
> above approach doesn't scale to a large and moving target like the
> kernel. Because the instructions in question work on the bare metal
> (IOW, the guest code is not "broken" in any sense of the word), people
> will continue writing kernel MMIO code in C that "lures" gcc into
> generating such ARM/AArch64 assembly that contains those instructions.
>
> The RHBZ I linked earlier remains elusive; the issue is not easy to
> trigger, and when it does trigger, one has to investigate the symptoms
> (the guest code at the trap address) every time, trace it back to
> C-language source code, and either tweak that C code, or else tweak the
> compiler flags specifically for that code / module. AIUI Daniel prefers
> to work around the KVM issue without having to analyze every guest site,
> as they pop up over time. The expression "all cases we can identify" is
> the core of the problem; it's not a well-defined set.
>
> Your edk2 ArmVirtQemu patch adds a heavy-weight flag (-fno-lto) to a
> pin-point location; another possibility (that might scale better to
> humans) is a new, lighter-weight flag, such as "-mno-multiple", that is
> applied universally to a codebase.
>
That will affect *all* memory references, which will undoubtedly hurt
performance.