PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Fri Aug 7 17:04:11 GMT 2020

Hi, Alexandre,

Thank you for the comments and suggestions.

> On Aug 7, 2020, at 8:20 AM, Alexandre Oliva <oliva@adacore.com> wrote:
> 
> On Jul 28, 2020, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
>>> 2. The main code generation part is moved from i386 backend to middle-end;
>>> 3. Add 4 target-hooks;
>>> 4. Implement these 4 target-hooks on i386 backend. 
>>> 5. On a target that does not implement the target hook, issue error
> 
> I wonder...  How important is it that the registers be zeroed, rather
> than just avoid leaking internal state from the function?

As I explained in other emails about the motivation of this patch:

From my understanding (I am not a security expert though), this patch should serve two purpose:

1. Erase the registers upon return to avoid information leak from the function;
2. ROP mitigation, for details on this, please refer to paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the above paper, The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

So, I believe that the call-used registers (especially those registers that pass parameters) need to be zeroed
In order to mitigate the ROP attack. 

> 
> It occurred to me that we could implement this in an entirely
> machine-independent way by just arranging for the option to change the
> calling conventions for all registers that are not used by return to be
> regarded as call-saved.  Then the prologue logic would save the incoming
> value of the registers, and the epilogue would restore them, and we're
> all set.  It might even cover propagation of exceptions out of the
> function.
> 
The above approach will have the following two issues:
1. the performance overhead will double (because there will be both “save” and “restore” insns in the prologue and epilogue)
2. The ROP mitigation purpose cannot be addressed.

> 
> Even if zeroing registers is desirable, it might still be possible to
> build upon the above to do that in a machine-independent fashion, using
> the annotations used to output call frame info to identify the slots in
> which the to-be-zeroed registers were saved, and store zeros there,
> either by modifying the save insns, or by adding extra stores to the end
> of the prologue, at least as a default implementation for a target hook,
> that could be overridden with something that does the job in more
> efficient but target-specific ways.

One of the major thing we have to consider for the implementation of this patch is, 
minimizing the performance overhead as much as possible.

I think that moving how to zeroing the registers part to each target will be a better solution since each target has
Better idea on how to use the most efficient insns to do the work.

Thanks.

Qing

> 
> 
> -- 
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer