PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Thu Sep 3 16:19:14 GMT 2020

Looks like that the PDF attachments do not work with this alias either. 
H.J. LU helped me to upload the performance data and code size data to the following wiki page:

https://gitlab.com/x86-gcc/gcc/-/wikis/Zero-call-used-registers-data

Please refer to this link for the data.

thanks.

Qing

> On Sep 3, 2020, at 10:08 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> Hi,
> 
> Looks like both attached .csv files were deleted during the email delivery procedure. Not sure what’s the reason for this.
> 
> Then I have to copy the text file here for you reference:
> 
> ****benchmarks:
> C       500.perlbench_r  
> C       502.gcc_r     
> C       505.mcf_r       
> C++     520.omnetpp_r    
> C++     523.xalancbmk_r  
> C       525.x264_r        
> C++     531.deepsjeng_r    
> C++     541.leela_r        
> C       557.xz_r       
> 
> 
> C++/C/Fortran   507.cactuBSSN_r      
> C++     508.namd_r    
> C++     510.parest_r     
> C++/C   511.povray_r   
> C       519.lbm_r     
> Fortran/C       521.wrf_r 
> C++/C   526.blender_r   
> Fortran/C       527.cam4_r  
> C       538.imagick_r  
> C       544.nab_r    
> 
> ***runtime overhead data and code size overhead data, I converted then to PDF files, hopefully this time I can attach it with the email:
> 
> thanks.
> 
> Qing
> 
> 
> 
> 
> 
> 
>> On Sep 3, 2020, at 9:29 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> Hi,
>> 
>> Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 
>> 
>> *** Machine info:
>> model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>> $ lscpu | grep NUMA
>> NUMA node(s):          2
>> NUMA node0 CPU(s):     0-21,44-65
>> NUMA node1 CPU(s):     22-43,66-87
>> 
>> ***CPU2017 benchmarks: 
>> all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 
>> 
>> ***Configures:
>> Intrate and fprate, 22 copies. 
>> 
>> ***Compiler options:
>> no : 				-g -O2 -march=native
>> used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
>> used_arg:  	 	no + -fzero-call-used-regs=used-arg
>> all_arg:			no + -fzero-call-used-regs=all-arg
>> used_gpr:		no + -fzero-call-used-regs=used-gpr
>> all_gpr:			no + -fzero-call-used-regs=all-gpr
>> used:			no + -fzero-call-used-regs=used
>> all:				no + -fzero-call-used-regs=all
>> 
>> ***each benchmark runs 3 times. 
>> 
>> ***runtime performance data:
>> Please see the attached csv file
>> 
>> 
>> From the data, we can see that:
>> On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
>> If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
>> Looks like the overhead of zeroing vector registers is much bigger. 
>> 
>> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.
>> 
>> ***code size increase data:
>> 
>> Please see the attached file 
>> 
>> 
>> From the data, we can see that:
>> The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.
>> 
>> So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 
>> 
>> Let me know you comments and opinions.
>> 
>> thanks.
>> 
>> Qing
>> 
>>> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> 
>>> 
>>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>>>> use it, it helps security at most none at all :-(
>>>>>> 
>>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>>>> enough users that it will be worth the effort for us.  Which is why I
>>>>>> keep hammering on this point.
>>>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>>>> For this testing? (Is CPU2017 good enough)?
>>>> 
>>>> I would use something more real-life, not 12 small pieces of code.
>>> 
>>> There is some basic information about the benchmarks of CPU2017 in below link:
>>> 
>>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
>>> 
>>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
>>> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
>>> 
>>> thanks.
>>> 
>>> Qing
>