Another issue on RS6000 target. Re: One issue with default implementation of zero_call_used_regs

Wed Sep 30 18:18:40 GMT 2020

> On Sep 30, 2020, at 11:25 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> 
>>>> As I checked, when the FP registers are zeroed, the above failure happened.
>>>> 
>>>> I suspect that the issue still relate to the following statement:
>>>> 
>>>> machine_mode mode = reg_raw_mode[regno];
>>>> 
>>>> As I checked, the reg_raw_mode always return the integer mode that can be hold by the hard registers, even though it’s FP register.
>>> 
>>> Well, more precisely: it's the largest mode that the target allows the
>>> registers to hold.  If there are multiple candidate modes of the same
>>> size, the integer one wins, like you say.  But the point is that DI only
>>> wins over DF because the target allows both DI and DF to be stored in
>>> the register, and therefore supports both DI and DF moves for that
>>> register.
>>> 
>>> So I don't think the mode is the issue.  Integer zero and floating-point
>>> zero have the same bit representation after all.
>> 
>> theoritically  yes. 
>> However, as we have noticed in Aarch64, the integer TI move has not been supported before your fix today. As a result, the Ti move have to be splitted.
>> With your fix today on aarch64,  Yes, the default implementation works well for those vector registers. Thanks a lot.
>> 
>> Potentially there will be other targets that have the same issue. Then those targets need to fix those issues too in order to make the default implementation work.
> 
> Right.  But that's not a bad thing.
> 
> My point above was that what you describe was not the issue for Power.
> AIUI the issue there was…
> 
>>> AIUI, without VSX, Power needs to load the zero from the constant pool.
> 
> …this instead.
> 
>>>> So, I still wondering:
>>>> 
>>>> 1. Is there another available utility routine that returns the proper MODE for the hard registers that can be readily used to zero the hard register?
>>>> 2. If not, should I add one more target hook for this purpose? i.e 
>>>> 
>>>> /* Return the proper machine mode that can be used to zero this hard register specified by REGNO.  */
>>>> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
>>>> 
>>>> 3. Or should I just delete the default implemeantion, and let the target to implement it.
>>> 
>>> IMO no.  This goes back to what we discussed earlier.  It isn't the
>>> case that a default target hook has to be correct for all targets,
>>> with targets only overriding them as an optimisation.  The default
>>> versions of many hooks and macros are not conservatively correct.
>>> They are just reaonable default assumptions.  And IMO that's true
>>> of the hook above too.
>>> 
>>> The way to flush out whether a target needs to override the hook
>>> is to add tests that run on all targets.
>> I planned to add these new test cases, so currently I have been testing the simple testing cases on aarch64 and rs6000 to see any issue 
>> With the default implementation. So far, I have found these issues with the very simple testing cases.
>> 
>> For me, at most I can test aarch64 and rs6000 targets for some small testing cases for checking correctness.
> 
> Thanks for testing other targets.  But I don't think that invalidates
> what I said above.  It might be that some of the targets you pick to
> test are ones that can't use the default implementation (or at least,
> not in all circumstances).  At that point, hopefully target maintainers
> will step in to help.
> 
>>> That said, one way of erring on the side of caution from an ICE
>>> perspective would be to do something like:
>>> 
>>>   rtx_insn *last_insn = get_last_insn ();
>>>   rtx zero = CONST0_RTX (reg_raw_mode[regno]);
>>>   rtx_insn *insn = emit_insn (gen_rtx_SET (regno_reg_rtx[regno], zero));
>>>   if (!valid_insn_p (insn))
>>>     {
>>>       delete_insns_since (last_insn);
>>>       ...remove regno from the set of cleared registers...;
>>>     }
>>> 
>>> where valid_insn_p abstracts out this code from ira.c:
>>> 
>>> 	  recog_memoized (move_insn);
>>> 	  if (INSN_CODE (move_insn) < 0)
>>> 	    continue;
>>> 	  extract_insn (move_insn);
>>> 	  /* We don't know whether the move will be in code that is optimized
>>> 	     for size or speed, so consider all enabled alternatives.  */
>>> 	  if (! constrain_operands (1, get_enabled_alternatives (move_insn)))
>>> 	    continue;
>>> 
>>> (but keeping the comment where it is).  The default behaviour would then
>>> be to drop any register that can't be zeroed easily.
>> 
>> I will check whether the above fix the ICE on rs6000.

Yes, it fixes the ICE on rs6000.
However, now with this check, ONLY integer registers on rs6000 can be zeroed.  All other call_used registers are excluded from ALL. 

This doesn’t look like the correct fix to me. 
>>> 
>>> Doing this would make the default hook usable for more targets.
>>> The question is whether dropping registers that can't be zeroed
>>> easily is acceptable as a default policy for a security feature.
>> 
>> If a “move_insn” is Not a valid insn per the above checking, can that “hard register” still be zeroed? 
> 
> Sure.  Like I said above, Power without VSX can zero FPRs by loading
> a zero from the constant pool.  It just isn't a single instruction.

Then this means the default implementation doesn’t work for FPR on rs6000. rs6000 has to override this implementation on target, right?
> 
>>> 
>>> I'm still a bit unsure what criteria are being applied to decide whether
>>> a register is worth zeroing or not, given that you were also talking about
>>> dropping mm0-7 and k0-7.
>> 
>> Should leaving such decision to targets make more sense?
> 
> The target gets the final say with both approaches.  That's the point
> of making it a target hook. :-)
> 
> But I don't think that really addresses my point.  What I mean is:
> what are the criteria *in general* for deciding whether a particular
> register is worth clearing or not?  And how do those criteria lead
> to the choices that you mentioned for x86?

I believe that the initial purpose of this patch on x86 is mainly for mitigating ROP, then only clearing General registers that are used in the routine and passes parameter should be enough for the purpose. 
Clearing ALL registers are added later for another purpose of avoiding information leaking through registers. (But not sure how important this purpose is).
I think that for this purpose, all the call-used registers should be zeroed. 

Kees, do you have any comment here?

> 
> I think this is something that would be good to write down, so that
> targets can take a consistent approach.  It would also increase the
> chances that clang and GCC are compatible, if clang ends up supporting
> the same feature.
> 
> I was bitten by this with the stack-smash detection (-fstack-protector).
> GCC took the attitude that it was necessary to zero out the register that
> contains the canary value.  LLVM took the attitude that it wasn't.
> I think both attitudes were “obvious” to the people that made them,
> but because the reasons weren't written down, it was hard to decide
> what to do when the two compilers took opposite positions.
> 
> I thnk we're in danger of having the same problem here, either between
> compilers or between targets of the same compiler.

Understood and agreed.

> 
> If we have common, written-down ground rules for deciding which kinds
> of register should be included and which shouldn't, it will be easier
> to apply those rules to a given target.  And it will be easier to decide
> on the right behaviour for the default implementation of the hook.
agreed.

> 
> It feels like the pushback against defining a default implementation
> of the hook is also a pushback against being specific about how targets
> are supposed to select which kinds of register need to be zeroed.

Another reason for against the default implementation is:  for FP registers, it might be better to let the target decide how to zero them. There seems no good general approach to zero FPs in the middle end.

thanks.

Qing

> 
> Thanks,
> Richard