PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao QING.ZHAO@ORACLE.COM
Tue Sep 15 15:05:46 GMT 2020



> On Sep 15, 2020, at 4:11 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>>>> Like I mentioned earlier though, passes that run after
>>>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>>>> the “used” version of the new command-line option is supposed to clear
>>>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>>>> collected at the point that the return is generated.
>>>>>> 
>>>>>> Thanks for the information.
>>>>>> 
>>>>>>> 
>>>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>>>> already exist.
>>>>>> 
>>>>>> Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated.
>>>>>> As I also checked, “return” might be also generated as late as pass “pass_delay_slots”,  So, shall we move the
>>>>>> New pass as late as possible?
>>>>> 
>>>>> If we insert the zeroing before pass_delay_slots and describe the
>>>>> result correctly, pass_delay_slots should do the right thing.
>>>>> 
>>>>> Describing the result correctly includes ensuring that the cleared
>>>>> registers are treated as live on exit from the function, so that the
>>>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>>>> 
>>>> In the current implementation for x86, when we generating a zeroing insn as the following:
>>>> 
>>>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>>>       (const_int 0 [0])) "t10.c":11:1 -1
>>>>    (nil))
>>>> (insn 19 18 20 2 (unspec_volatile [
>>>>           (reg:SI 1 dx)
>>>>       ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>>>    (nil))
>>>> 
>>>> i.e, after each zeroing insn, the register that is zeroed is marked as “UNSPECV_PRO_EPILOGUE_USE”, 
>>>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>>>> 
>>>> Is doing this enough to describe the result correctly?
>>>> Is there other thing we need to do in addition to this?
>>> 
>>> I guess that works, but I think it would be better to abstract
>>> EPILOGUE_USES into a new target-independent wrapper function that
>>> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>>> true for registers that need to be zero on return, if the zeroing
>>> instructions have already been inserted.  The places that currently
>>> test EPILOGUE_USES should then test this new wrapper function instead.
>> 
>> Okay, I see. 
>> Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow information. If EPILOUGE_USES return true
>> for registers that need to be zeroed on return, those registers will be included in the data flow information, as a result, later
>> passes will not be able to delete them. 
>> 
>> This sounds to be a cleaner approach than the current one that marks the registers  “UNSPECV_PRO_EPILOGUE_USE”. 
>> 
>> A more detailed implementation question on this: 
>> Where should I put this new target-independent wrapper function in? Which header file will be a proper place to hold this new function?
> 
> Not a strong opinion, but: maybe df.h and df-scan.c, since this is
> really a DF query.

Okay.

> 
>>> After inserting the zeroing instructions, the pass should recompute the
>>> live-out sets based on this.
> 
> Sorry, I was wrong here.  It should *cause* the sets to be recomputed
> where necessary (rather than recompute them directly), but see below.
> 
>> Is only computing the live-out sets of the block that including the return insn enough? Or we should re-compute the whole procedure? 
>> 
>> Which utility routine I should use to recompute the live-out sets?
> 
> Inserting the instructions will cause the containing block to be marked
> dirty, via df_set_bb_dirty.  I think the pass should also call
> df_set_bb_dirty on the exit block itself, to indicate that the
> wrapper around EPILOGUE_USES has changed behaviour, but that might
> not be necessary.
> 
> This gives the df machinery enough information to work out what has changed.
> It will then propagate those changes throughout the function.  (I don't
> think any propagation would be necessary here, but if I'm wrong about that,
> then the df machinery will do whatever propagation is necessary.)
> 
> However, the convention is for a pass that uses the df machinery to call
> df_analyze first.  This call to df_analyze updates any stale df information.
> 
> So unlike what I said yesterday, the pass itself doesn't need to make sure
> that the df information is up-to-date.  It just needs to indicate what
> has changed, as above.
> 
> In the case of pass_delay_slots, pass_free_cfg has:
> 
>  /* The resource.c machinery uses DF but the CFG isn't guaranteed to be
>     valid at that point so it would be too late to call df_analyze.  */
>  if (DELAY_SLOTS && optimize > 0 && flag_delayed_branch)
>    {
>      df_note_add_problem ();
>      df_analyze ();
>    }
> 
> Any other machine-specific passes that use df already need to call
> df_analyze (if they use the df machinery).  So simply marking what
> has changed is enough (by design).

So, in this new pass, I need:

1. Call “df_analyze” in the beginning to get the up-to-data df information;
2. After generating the zero insns, mark the containing block with “df_set_bb_dirty”. 
3. mark the exit block with “df_set_bb_dirty” to indicate the wrapper around EPILOGUE_USES changed
    Behavior. (This might not need since “df_analyze” in the next pass will call EPILOGUE_USES automatically? )

Is the above enough for DF?

(BTW, how expensive to call “df_analyze”?)

> 
>> My understanding is:
>> 
>> In our new pass that is put in the beginning of the pass_late_compilation, I,e pass_zero_call_used_regs;
>> 
>>      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>> ++++  NEXT_PASS (pass_zero_call_used_regs);
>>          NEXT_PASS (pass_compute_alignments);
>>          NEXT_PASS (pass_variable_tracking);
>>          NEXT_PASS (pass_free_cfg);
>>          NEXT_PASS (pass_machine_reorg);
>>          NEXT_PASS (pass_cleanup_barriers);
>>          NEXT_PASS (pass_delay_slots);
>> 
>> When we scan the EXIT BLOCK of the routine, all the return insns have already been there.
>> The later passes including “pass_delay_slots” will not generate additional returns anymore,  they might just call “target.gen_return” or “target.gen_simple_return() to replace 
>> “ret_rtx” or “simple_ret_rtx” ?
> 
> Kind-of.  pass_delay_slots can also duplicate code, so it's not always a
> straight replacement.  But the point is that returns don't appear out of
> nowhere.  There has to be a semantic reason for them to exist.  The
> behaviour of the function after pass_delay_slots has to be the same
> as it was before the pass (disregarding undefined behaviour).  Once we've
> added clearing of the zero registers to all return paths, that clearing
> becomes part of the behaviour of the function, and so will be part of
> the behaviour after pass_delay_slots as well.
> 
> So I don't think the problem is with passes generating new returns.
> It's more whether they could use new registers that then need to be
> cleared, which is the main justification for running the new pass
> so late in the pipeline.

agreed.

> 
> In principle, there's nothing stopping pass_delay_slots allocating
> new registers (like pass_regrename does), and in principle that could
> introduce the need to do more clearing.  But I don't think the current
> pass does that.  The pass is also very much legacy code at this point,
> so the chances of new optimisations being added to it are fairly low.
> If that did happen, I think it would be reasonable to expect the pass
> to work within the set of registers that have already been allocated,
> at least when your new option is in effect.

Okay, thanks for the information.

Qing
> 
> Thanks,
> Richard



More information about the Gcc-patches mailing list