PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Fri Sep 4 14:18:36 GMT 2020

> On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor <victor.rodriguez.bahena@intel.com> wrote:
> 
> 
> 
> -----Original Message-----
> From: Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>>
> Date: Thursday, September 3, 2020 at 12:55 PM
> To: Kees Cook <keescook@chromium.org <mailto:keescook@chromium.org>>
> Cc: Segher Boessenkool <segher@kernel.crashing.org <mailto:segher@kernel.crashing.org>>, Jakub Jelinek <jakub@redhat.com <mailto:jakub@redhat.com>>, Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com <mailto:victor.rodriguez.bahena@intel.com>>, GCC Patches <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>>
> Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> 
> 
> 
>> On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
>> 
>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>>> Looks like the overhead of zeroing vector registers is much bigger. 
>>> 
>>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> That looks great; thanks for doing those tests!
>> 
>> (And it seems like these benchmarks are kind of a "worst case" scenario
>> with regard to performance, yes? As in it's mostly tight call loops?)
> 
>    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
>    All of them are C++ benchmarks. 
>    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
>    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  
> 
>    Qing
> 
> I think that overhead is expected in benchmarks like 541.leela_r, according to https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$>  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 
> 
> Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

From the performance data, we can see that the runtime overhead of clearing only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever used or not in the current routine, the overhead will be increased dramatically. 

So, my question is:

From the security point of view, does clearing ALL registers have more benefit than clearing USED registers?  
From my understanding, clearing registers that are not used in the current routine does NOT provide additional benefit, correct me if I am wrong here.

Thanks.

Qing

> 
> Regards
> 
> Victor 
> 
> 
>> 
>> -- 
>> Kees Cook