This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Spectre V1 diagnostic / mitigation


On Tue, 18 Dec 2018, Jeff Law wrote:

> On 12/18/18 8:36 AM, Richard Biener wrote:
> > 
> > Hi,
> > 
> > in the past weeks I've been looking into prototyping both spectre V1 
> > (speculative array bound bypass) diagnostics and mitigation in an
> > architecture independent manner to assess feasability and some kind
> > of upper bound on the performance impact one can expect.
> > https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> > an interesting read in this context as well.
> > 
> > For simplicity I have implemented mitigation on GIMPLE right before
> > RTL expansion and have chosen TLS to do mitigation across function
> > boundaries.  Diagnostics sit in the same place but both are not in
> > any way dependent on each other.
> > 
> > The mitigation strategy chosen is that of tracking speculation
> > state via a mask that can be used to zero parts of the addresses
> > that leak the actual data.  That's similar to what aarch64 does
> > with -mtrack-speculation (but oddly there's no mitigation there).
> > 
> > I've optimized things to the point that is reasonable when working
> > target independent on GIMPLE but I've only looked at x86 assembly
> > and performance.  I expect any "final" mitigation if we choose to
> > implement and integrate such would be after RTL expansion since
> > RTL expansion can end up introducing quite some control flow whose
> > speculation state is not properly tracked by the prototype.
> > 
> > I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> > were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> > mitigation and =3 does mitigation global with passing the state
> > via TLS memory.
> > 
> > The following was measured on a Haswell desktop CPU:
> [ ... ]
> Interesting.  So we'd been kicking this issue around a bit internally.
> 
> The number of packages where we'd want to turn this on was very small
> and thus it was difficult to justify burning resources in this space.
> LLVM might be an option for those limited packages, but LLVM is missing
> other security things we don't want to lose (such as stack clash
> mitigation).
> 
> In the end we punted for the immediate future.  We'll almost certainly
> revisit at some point and your prototype would obviously factor into the
> calculus around future decisions.
> 
> [ ... ]
> 
> 
> > 
> > 
> > The patch relies heavily on RTL optimizations for DCE purposes.  At the
> > same time we rely on RTL not statically computing the mask (RTL has no
> > conditional constant propagation).  Full instrumentation of the classic
> > Spectre V1 testcase
> Right. But it does do constant propagation into arms of conditionals as
> well as jump threading.  I'd fear they might compromise things.

jump threading shouldn't be an issue since that elides the conditional.
I didn't see constant propagation into arms of conditionals happening.
We don't do that on GIMPLE either ;)  I guess I have avoided this
by making the condition data dependent on the mask.  That is, I
transform

  if (a > b)

to

  mask = a > b ? -1 : 0;
  if (mask)
    ...

so one need to replace the condition with the mask computation
conditional.

But yes, for a "final" solution that also gives more control to
targets I thought of allowing (with fallback doing sth like above)
the targets to supply a set-mask-and-jump pattern combining
conditional, mask generation and jump.  I guess those would look
similar to the -fwrapv plusv patterns we have in i386.md.

> Obviously we'd need to look further into those issues.  But even if they
> do, something like what you've done may mitigate enough vulnerable
> sequences that it's worth doing, even if there's some gaps due to "over"
> optimization in the RTL space.

Yeah.  Note I was just lazy and thus didn't elide useless loads/stores
of the TLS var for adjacent calls or avoided instrumenting cases
where there will be no uses of the mask, etc.  With some simple
(even non-LCM) insertion optimization the dependence on dce/dse
can be avoided.

Richard.

> [  ... ]
> 
> > 
> > so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> > 
> > Patch below for reference (and your own testing in case you are curious).
> > I do not plan to pursue this further at this point.
> Understood.  Thanks for posting it.  We're not currently working in this
> space, but again, we may re-evaluate that stance in the future.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]