PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Rodriguez Bahena, Victor victor.rodriguez.bahena@intel.com
Fri Sep 4 01:23:14 GMT 2020



-----Original Message-----
From: Qing Zhao <QING.ZHAO@oracle.com>
Date: Thursday, September 3, 2020 at 12:55 PM
To: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>, Jakub Jelinek <jakub@redhat.com>, Uros Bizjak <ubizjak@gmail.com>, "Rodriguez Bahena, Victor" <victor.rodriguez.bahena@intel.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]



    > On Sep 3, 2020, at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
    > 
    > On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
    >> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
    >> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
    >> Looks like the overhead of zeroing vector registers is much bigger. 
    >> 
    >> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
    > 
    > That looks great; thanks for doing those tests!
    > 
    > (And it seems like these benchmarks are kind of a "worst case" scenario
    > with regard to performance, yes? As in it's mostly tight call loops?)

    The top 3 benchmarks that have the most overhead from this option are: 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
    All of them are C++ benchmarks. 
    I guess that the most important reason is  the smaller routine size in general (especially at the hot execution path or loops).
    As a result, the overhead of these additional zeroing instructions in each routine will be relatively higher.  

    Qing

I think that overhead is expected in benchmarks like 541.leela_r, according to https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree search & pattern recognition). The addition of fzero-call-used-regs will represent an overhead each time the functions are being call and in areas like game tree search is high. 

Qing, thanks a lot for the measurement, I am not sure if this is the limit of overhead the community is willing to accept by adding extra security (me as gcc user will be willing to accept). 

Regards

Victor 


    > 
    > -- 
    > Kees Cook




More information about the Gcc-patches mailing list