PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao QING.ZHAO@ORACLE.COM
Wed Aug 5 21:35:45 GMT 2020


Hi, Richard,

Thanks a lot for your careful review and detailed comments.  


> On Aug 4, 2020, at 2:35 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> I have a few comments below - I'm not sure I'm qualified to fully
> review the rest though.

Could you let me know who will be the more qualified person to fully review the rest of middle-end change?

>>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>>>> command-line option and
>>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
>>>>> 
>>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>>>>> 
>>>>> Don't zero call-used registers upon function return.
> 
> Does a return via EH unwinding also constitute a function return?  I
> think you may want to have a finally handler or support in the unwinder
> for this?  Then there's abnormal return via longjmp & friends, I guess
> there's nothing that can be done there besides patching glibc?
> 
> In general I am missing reasoning as why to use -fzero-call-used-regs=
> in the documentation, that is, what is the thread model and what are
> the guarantees?  Is there any point zeroing registers when spill slots
> are left populated with stale register contents?  How do I (and why
> would I want to?) ensure that there's no information leak from the
> implementation of 'foo' to their callers?  Do I need to compile all
> of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> or is it enough to annotate API boundaries I want to proptect with
> zero_call_used_regs("...")?
> 
> Again - what's the intended use (and how does it fulful anything useful
> for that case)?

The major question of the above is:  what’s the motivation of the whole patch?
H.J.Lu and I have replied this question in separated emails, let’s continue with
this high-level discussion in that thread. 


>>>>> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree name,
>>>>> return NULL_TREE;
>>>>> }
>>>>> 
>>>>> +/* Handle a "zero_call_used_regs" attribute; arguments as in
>>>>> +   struct attribute_spec.handler.  */
>>>>> +
>>>>> +static tree
>>>>> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
>>>>> +                                   int ARG_UNUSED (flags),
>>>>> +                                   bool *no_add_attris)
>>>>> +{
>>>>> +  tree decl = *node;
>>>>> +  tree id = TREE_VALUE (args);
>>>>> +  enum zero_call_used_regs zero_call_used_regs_type = zero_call_used_regs_unset;
>>>>> +
>>>>> +  if (TREE_CODE (decl) != FUNCTION_DECL)
>>>>> +    {
>>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>>>> +             "%qE attribute applies only to functions", name);
>>>>> +      *no_add_attris = true;
>>>>> +      return NULL_TREE;
>>>>> +    }
>>>>> +  else if (DECL_INITIAL (decl))
>>>>> +    {
>>>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>>>> +             "cannot set %qE attribute after definition", name);
> 
> Why's that?
This might not be needed, I will fix this in the next update.

>>>>> 
>>>>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>>>>> index 81bd2ee..ded1880 100644
>>>>> --- a/gcc/c/c-decl.c
>>>>> +++ b/gcc/c/c-decl.c
>>>>> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
>>>>>        DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
>>>>>      }
>>>>> 
>>>>> +      /* Merge the zero_call_used_regs_type information.  */
>>>>> +      if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>>>> +     DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS (olddecl);
>>>>> +
> 
> If you need this (see below) then likely cp/* needs similar adjustment
> so do other places in the middle-end (function cloning, etc)

Will check this in cp/* and function cloning etc to see whether the copying and merging are needed in other
places.

Another thought, if I use “lookup_attribute” of the function decl instead of checking these bits as you suggested
later,  all these copying and merging might not be necessary anymore. I will check on that. 
> 
>>>>> 
>>>>> +
>>>>> +/* Emit a sequence of insns to zero the call-used-registers for the current
>>>>> + * function.  */
> 
> No '*' on the continuation line

Okay, will fix this.

>>>>> +
>>>>> +  /* This array holds the zero rtx with the correponding machine mode.  */
>>>>> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
>>>>> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
>>>>> +    zero_rtx[i] = NULL_RTX;
>>>>> +
>>>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>>>> +    {
>>>>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> 
> Use if (!call_used_regs[regno])
Okay.

> 
>>>>> +     continue;
>>>>> +      if (fixed_regs[regno])
>>>>> +     continue;
>>>>> +      if (is_live_reg_at_exit (regno))
>>>>> +     continue;
> 
> How can a call-used reg be live at exit?

Yes, this might not be needed, I will double check on this.

> 
>>>>> +      if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only))
>>>>> +     continue;
> 
> Why does the target need some extra say here?

Only target can decide which hard regs should be zeroed, and which hard regs are general purpose register. 

> 
>>>>> +      if (used_only && !df_regs_ever_live_p (regno))
> 
> So I suppose this does not include uses by callees of this function?

Yes, I think so. 
> 
>>>>> +     continue;
>>>>> +
>>>>> +      /* Now we can emit insn to zero this register.  */
>>>>> +      rtx reg, tmp;
>>>>> +
>>>>> +      machine_mode mode
>>>>> +     = targetm.calls.zero_call_used_regno_mode (regno,
>>>>> +                                                reg_raw_mode[regno]);
> 
> In what case does the target ever need to adjust this (we're dealing
> with hard-regs only?)?

For x86, for example, even though the GPR registers are 64-bit, we only need to zero the lower 32-bit. etc.

> 
>>>>> +      if (mode == VOIDmode)
>>>>> +     continue;
>>>>> +      if (!have_regs_of_mode[mode])
>>>>> +     continue;
> 
> When does this happen?

This might be removed. I will check. 
> 
>>>>> +
>>>>> +      reg = gen_rtx_REG (mode, regno);
>>>>> +      if (zero_rtx[(int)mode] == NULL_RTX)
>>>>> +     {
>>>>> +       zero_rtx[(int)mode] = reg;
>>>>> +       tmp = gen_rtx_SET (reg, const0_rtx);
>>>>> +       emit_insn (tmp);
>>>>> +     }
>>>>> +      else
>>>>> +     emit_move_insn (reg, zero_rtx[(int)mode]);
> 
> Not sure but I think the canonical zero to use is CONST0_RTX (mode)
> but I may be wrong.  

You mean “const0_rtx” should be “CONST0_RTX(mode)”? 
I will check on this.

> I'd rather have the target be able to specify
> some special instruction for zeroing here.  Some may have
> multi-reg set instructions for example.  That said, can't we
> defer the actual zeroing to the target in full and only compute
> a hard-reg-set of to-be zerored registers here and pass that
> to a target hook?

For vector regs, we have already provided this interface with 

targetm.calls.zero_all_vector_registers (used_only)

For integer registers, do we need such target hook too? 
If so, yes, it might be better to let the target decide how to zero the registers.

If Not, the current design might be good enough, right?

> 
>>>>> +
>>>>> +      emit_insn (targetm.calls.pro_epilogue_use (reg));
>>>>> +    }
>>>>> +
>>>>> +  return;
>>>>> +}
>>>>> +
>>>>> +
>>>>> /* Return a sequence to be used as the epilogue for the current function,
>>>>>  or NULL.  */
>>>>> 
>>>>> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void)
>>>>> 
>>>>> start_sequence ();
>>>>> emit_note (NOTE_INSN_EPILOGUE_BEG);
>>>>> +
>>>>> +  gen_call_used_regs_seq ();
>>>>> +
> 
> The caller eventually performs shrink-wrapping - are you sure that
> doesn't mess up things?

My understanding is, in the standard epilogue, there is no handling of “call-used” registers.  Therefore, shrink-wrapping will not impact
“call-used” registers as well. 
Our patch only handles call-used registers, so, there should be no any interaction between this patch and shrink-wrapping.

> 
>>>>> 
>>>>> +
>>>>> + /* How to clear call-used registers upon function return.  */
>>>>> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3;
>>>>> +
>>>>> + /* 11 unused bits.  */
> 
> So instead of wasting "precious" bits please use lookup_attribute
> in the single place you query this value (which is once per function).
> There's no need to complicate matters by trying to maintain the above.

Thanks for the suggestion.
Yes, I will try to use lookup_attribute in function.c instead of adding these bits. That will save us these
precious space.

Thanks again.

Qing


More information about the Gcc-patches mailing list