PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao QING.ZHAO@ORACLE.COM
Fri Aug 7 16:06:38 GMT 2020


Hi, Segher,

Thanks for your comments.

> On Aug 6, 2020, at 6:37 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Thu, Aug 06, 2020 at 10:31:27AM +0200, Richard Biener wrote:
>> Jeff might be, but with the intended purpose (ROP mitigation AFAIU)
>> it would be nice for other target maintainers to chime in (Segher for
>> power maybe) for the question below...
> 
> It would be nice if this described anywhere what the benefit of this is,
> including actual hard numbers.  I only see it is very costly, and I see
> no benefit whatsoever.

I will add the motivation of this patch clearly in the next patch update. 
Here, for your reference, As I mentioned in other emails you might miss,
From my understanding (I am not a security expert though), this patch should serve two purpose:

1. Erase the registers upon return to avoid information leak;
2. ROP mitigation, for details on this, please refer to paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming Attacks"

https://ieeexplore.ieee.org/document/8445132 <https://ieeexplore.ieee.org/document/8445132>

From the above paper, The call-used registers are used by the ROP hackers as following:

"Based on the practical experience of reading and writing ROP code. we find the features of ROP attacks as follows.

First, the destination of using gadget chains in usual is performing system call or system function to perform 
malicious behaviour such as file access, network access and W ⊕ X disable. In most cases, the adversary
would like to disable W ⊕ X. Because once W ⊕ X has been disabled, shellcode can be executed directly
instead of rewritting shellcode to ROP chains which may cause some troubles for the adversary. In upper 
example, the system call is number 59 which is “execve” system call.

Second, if the adversary performs ROP attacks using system call instruction, no matter on x86 or x64 
architecture, the register would be used to pass parameter. Or if the adversary performs ROP attacks 
using system function such as “read” or “mprotect”, on x64 system, the register would still be used to 
pass parameters, as mentioned in subsection B and C.”

We can see that call-used registers might be used by the ROP hackers to pass parameters to the system call.
If compiler can clean these registers before routine “return", then ROP attack will be invalid. 

Yes, there will be performance overhead from adding these register wiping insns. However, it’s necessary to
add overhead for security purpose.
Of course, on the other hand, We need to consider to minimize the performance overhead in our implementation. 


> 
>>>>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>>>>> command-line option and
>>>>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> 
> "call-used" is such a bad name.  "call-clobbered" is better already, but
> "volatile" (over calls) is most obvious I think.

In our GCC compiler source code, we used the name “call-used” a lot, of course, “call-clobbered” is
also used frequently.  Do these names refer to the same set of registers, i.e, the register set that  
will be corrupted by function call?
If so, I am okay with name “call-clobbered” if this name sounds better. 

> 
> There are at least four different kinds of volatile registers:
> 
> 1) Argument registers are volatile, on most ABIs.
These are the registers that  need to be cleaned up upon function return for the ROP mitigation described in the paper
mentioned above.

> 2) The *linker* (or dynamic linker!) may insert code that needs some
>   registers for itself;
> 3) Registers only used for scratch space;
> 4) Registers used for returning the function value.

I think that the above 1,3,4 should be all covered by “call_used_regs”. 

Not sure about 2, could you explain a little bit more on 2 (The linker may insert code that needs some register for itself)? 

> 
> And these can overlap, and differ per function.
> 
>>>> Again - what's the intended use (and how does it fulful anything useful
>>>> for that case)?
> 
> Yes, exactly.
Please see my responds in the beginning. 

> 
>>>>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>>>>> +     continue;
>>>> 
>>>> Why does the target need some extra say here?
>>> 
>>> Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 
>> 
>> I'm mostly questioning the plethora of target hooks added and whether
>> this details are a good granularity applying to more than just x86.
>> Did I suggest to compute a hardreg set that the middle-end says was
>> used and is not live and leave the rest to the target?
> 
> It probably would be much easier to just have the target do *all* of
> this, in one hook, or maybe even in the existing epilogue stuff.  The
> resulting binary code will be very slow no matter what, so this should
> not matter much at all.

I have agreed that moving the zeroing regs part entirely to target. Middle-end will only compute a hard regs set that need to be
zeroed and pass it to target.

> 
>>>>>>>> +      machine_mode mode
>>>>>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
>>>>>>>> +                                                reg_raw_mode[regno]);
>>>> 
>>>> In what case does the target ever need to adjust this (we're dealing
>>>> with hard-regs only?)?
>>> 
>>> For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.
>> 
>> That's an optimization, yes.
> 
> I gues what is meant here is that the usual x86-64 insns to clear the
> low 32 bits of a register actually clear the whole register?

This is my understanding. H.J.Lu might provide better explanation if needed.

>  It is a
> huge security leak otherwise.  And, the generic code has nothing to do
> with this, define hooks that ask the target to clear stuff, instead?

Yes, I think that these kind of details are not good to be exposed to middle-end.

> 
>>>>>>>> +      reg = gen_rtx_REG (mode, regno);
>>>>>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
>>>>>>>> +     {
>>>>>>>> +       zero_rtx[(int)mode] = reg;
>>>>>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
>>>>>>>> +       emit_insn (tmp);
>>>>>>>> +     }
>>>>>>>> +      else
>>>>>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
>>>> 
>>>> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
>>>> but I may be wrong.  
>>> 
>>> You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
>>> I will check on this.
> 
> If it is a CONST_INT, you should use const0_rtx; otherwise,
> CONST0_RTX (mode) .  I have no idea what zero_rtx is, but there is
> const_tiny_rtx already, and you shouldn't use that directly either.

Okay.

> 
>> But why not simplify it all to a single hook
>> 
>>  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
>> 
>> ?
> 
> Yeah.  With a much better name though (it should say what it is for, or
> describe at a *high level* what it does).
Okay.

> 
>>>>>>>> start_sequence ();
>>>>>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
>>>>>>>> +
>>>>>>>> +  gen_call_used_regs_seq ();
>>>>>>>> +
>>>> 
>>>> The caller eventually performs shrink-wrapping - are you sure that
>>>> doesn't mess up things?
>>> 
>>> My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
>>> “call-used” registers as well. 
>>> Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.
>> 
>> I don't know (CCed Segher, he should eventually).
> 
> Shrink-wrapping often deals with the non-volatile registers, so that
> doesn't matter much for this patch series.

Yes, that was my understanding as well. 

>  But the epilogue can use
> some volatile registers as well, including to hold sensitive info.  And
> of course everything is different if you use separate shrink-wrapping,
> but that work is done already when you get here (so it is too late?)

Could you please explain this part a little bit more?

> 
> 
> Anyway.  This all needs a good description in the user manual (is there?
> I couldn't find any), explaining what exactly it does (user-visible),
> and when you would want to use it, etc.  We need that before we can
> review anything else in this patch sanely.
Will do.

Qing
> 
> 
> Segher



More information about the Gcc-patches mailing list