[PATCH 0/3] Add __builtin_load_no_speculate

Fri Jan 5 10:59:00 GMT 2018

On 05/01/18 10:47, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>> On 05/01/18 09:44, Richard Biener wrote:
>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>> <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> Recently, Google Project Zero disclosed several classes of attack
>>>> against speculative execution. One of these, known as variant-1
>>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>>> speculation, providing an arbitrary read gadget. Further details can
>>>> be found on the GPZ blog [1] and the documentation that is included
>>>> with the first patch.
>>>>
>>>> This patch set adds a new builtin function for GCC to provide a
>>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>>> memory access.  I've tried to design this in such a way that it can be
>>>> used for any target where this might be necessary.  The patch set
>>>> provides a generic implementation of the builtin and then
>>>> target-specific support for Arm and AArch64.  Other architectures can
>>>> utilize the internal infrastructure as needed.
>>>>
>>>> Most of the details of the builtin and the hooks that need to be
>>>> implemented to support it are described in the updates to the manual,
>>>> but a short summary is given below.
>>>>
>>>> TYP __builtin_load_no_speculate
>>>>         (const volatile TYP *ptr,
>>>>          const volatile void *lower,
>>>>          const volatile void *upper,
>>>>          TYP failval,
>>>>          const volatile void *cmpptr)
>>>>
>>>> Where TYP can be any integral type (signed or unsigned char, int,
>>>> short, long, etc) or any pointer type.
>>>>
>>>> The builtin implements the following logical behaviour:
>>>>
>>>> inline TYP __builtin_load_no_speculate
>>>>          (const volatile TYP *ptr,
>>>>           const volatile void *lower,
>>>>           const volatile void *upper,
>>>>           TYP failval,
>>>>           const volatile void *cmpptr)
>>>> {
>>>>   TYP result;
>>>>
>>>>   if (cmpptr >= lower && cmpptr < upper)
>>>>     result = *ptr;
>>>>   else
>>>>     result = failval;
>>>>   return result;
>>>> }
>>>>
>>>> in addition the specification of the builtin ensures that future
>>>> speculation using *ptr may only continue iff cmpptr lies within the
>>>> bounds specified.
>>>
>>> I fail to see how the vulnerability doesn't affect aggregate copies
>>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>>> value to leak but (part of) the address by populating the CPU cache
>>> side-channel.
>>>
>>
>> It's not quite as simple as that.  You'll need to read the white paper
>> on Arm's web site to get a full grasp of this (linked from
>> https://www.arm.com/security-update).
>>
>>> So why isn't this just
>>>
>>>  T __builtin_load_no_speculate (T *);
>>>
>>> ?  Why that "fallback" and why the lower/upper bounds?
>>
>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
>> to restrict subsequent speculation, the CSEL needs a condition and the
>> compiler must not be able to optimize it away based on logical
>> reachability.  The fallback is used for the other operand of the CSEL
>> instruction.
> 
> But the condition could be just 'true' in the instruction encoding?  That is,
> why do you do both the jump-around and the csel?  Maybe I misunderstood
> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> the compiler messing up things.
> 

That's why the intrinsic takes explicit bounds not a simple bool
expressing the conditions.  It prevents the optimizers from replacing
the condition by value range propagation.  That does restrict the
flexibility of the builtin somewhat but it does allow it to work
correctly at all optimization levels.

>>>
>>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>>
>> As stated we'll shortly be submitting similar patches for LLVM.
> 
> How do you solve the aggregate load issue?  I would imagine
> that speculating stores (by pre-fetching the affected cache line!)
> could be another attack vector?  Not sure if any CPU speculatively
> does that (but I see no good reason why a CPU shouldn't do that).
> 
> Does ARM have a barrier like instruction suitable for use in the
> kernel patches that are floating around?
> 
> Richard.
> 
>> R.
>>
>>>
>>> Richard.
>>>
>>>> Some optimizations are permitted to make the builtin easier to use.
>>>> The final two arguments can both be omitted (c++ style): failval will
>>>> default to 0 in this case and if cmpptr is omitted ptr will be used
>>>> for expansions of the range check.  In addition either lower or upper
>>>> (but not both) may be a literal NULL and the expansion will then
>>>> ignore that boundary condition when expanding.
>>>>
>>>> The patch set is constructed as follows:
>>>> 1 - generic modifications to GCC providing the builtin function for all
>>>>     architectures and expanding to an implementation that gives the
>>>>     logical behaviour of the builtin only.  A warning is generated if
>>>>     this expansion path is used that code will execute correctly but
>>>>     without providing protection against speculative use.
>>>> 2 - AArch64 support
>>>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>>>
>>>> These patches can be used with the header file that Arm recently
>>>> published here: https://github.com/ARM-software/speculation-barrier.
>>>>
>>>> Kernel patches are also being developed, eg:
>>>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
>>>> code like this will be able to use support directly from the compiler
>>>> in a portable manner.
>>>>
>>>> Similar patches are also being developed for LLVM and will be posted
>>>> to their development lists shortly.
>>>>
>>>> [1] More information on the topic can be found here:
>>>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
>>>> Arm specific information can be found here: https://www.arm.com/security-update
>>>>
>>>>
>>>>
>>>> Richard Earnshaw (3):
>>>>   [builtins] Generic support for __builtin_load_no_speculate()
>>>>   [aarch64] Implement support for __builtin_load_no_speculate.
>>>>   [arm] Implement support for the de-speculation intrinsic
>>>>
>>>>  gcc/builtin-types.def         |  16 +++++
>>>>  gcc/builtins.c                |  99 +++++++++++++++++++++++++
>>>>  gcc/builtins.def              |  22 ++++++
>>>>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++
>>>>  gcc/c-family/c-cppbuiltin.c   |   5 +-
>>>>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++
>>>>  gcc/config/aarch64/aarch64.md |  28 ++++++++
>>>>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++
>>>>  gcc/config/arm/arm.md         |  40 ++++++++++-
>>>>  gcc/config/arm/unspecs.md     |   1 +
>>>>  gcc/doc/cpp.texi              |   4 ++
>>>>  gcc/doc/extend.texi           |  53 ++++++++++++++
>>>>  gcc/doc/tm.texi               |   6 ++
>>>>  gcc/doc/tm.texi.in            |   2 +
>>>>  gcc/target.def                |  20 ++++++
>>>>  gcc/targhooks.c               |  69 ++++++++++++++++++
>>>>  gcc/targhooks.h               |   3 +
>>>>  17 files changed, 729 insertions(+), 2 deletions(-)
>>>>
>>>>
>>>>
>>