PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao QING.ZHAO@ORACLE.COM
Fri Sep 11 19:53:46 GMT 2020



> On Sep 11, 2020, at 12:18 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote:
>>>>>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>>>>>> you can throw away.
> 
> ^^^ !!! ^^^
> 
>>> Start looking at handle_simple_exit()?  targetm.gen_simple_return()…
>> 
>> Yes, I have been looking at this since this morning. 
>> You are right, we also need to insert zeroing sequence before  this simple_return which the current patch missed.
> 
> Please run the performance loss numbers again after you have something
> more realistic :-(

Yes, I will collect the performance data with the new patch. 

> 
>> I am currently try to resolve this issue with the following idea:
>> 
>> In the routine “thread_prologue_and_epilogue_insns”,  After both “make_epilogue_seq” and “try_shrink_wrapping” finished, 
>> 
>> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
>> If YES, generate the zero sequence before this RETURN insn. 
>> 
>> Then we should take care all the exit path that returns.
>> 
>> Do you see any issue from this idea? 
> 
> You need to let the backend decide what to do, for this as well as for
> all other cases.  I do not know how often I will have to repeat that.

Yes, the new patch will separate the whole task into two parts:

A. Compute the hard register set based on user option, source code attribute, data flow information, function abi information, 
     The result will be “need_zeroed_register_set”, and then pass this hard reg set to the target hook.
B. Each target will have it’s own implementation of emitting the zeroing sequence based on the “need_zeroed_register_set”.


> 
> There also is separate shrink-wrapping, which you haven't touched on at
> all yet.  Joy.

Yes, in addition to shrink-wrapping, I also noticed that there are other places that generate “simple_return” or “return” that are not in
The epilogue, for example, in “dbr” phase (delay_slots phase), in “mach” phase (machine reorg phase), etc. 

So, only generate zeroing sequence in epilogue is not enough. 

Hongjiu and I discussed this more, and we came up with a new implementation, I will describe this new implementation in another email later.

Thanks.

Qing
> 
> 
> Segher



More information about the Gcc-patches mailing list