One issue with default implementation of zero_call_used_regs

Fri Sep 25 14:57:01 GMT 2020

> On Sep 25, 2020, at 7:53 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> Hi, Richard,
>> 
>> As you suggested, I added a default implementation of the target hook “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>> 
>> 
>> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>> 
>> void
>> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> 
> FWIW, I was suggesting to return the set of registers that are actually
> cleared too.  Here you have the hook emit the asm statement, but IMO the
> way we generate the asm for a given set of registers should be entirely
> target-independent, and happen outside the hook.
> 
> So the hook returning the set of cleared registers does two things:
> 
> (1) It indicates which registers should be clobbered by the asm
>    (which would be generated after calling the hook, but emitted
>    before the sequence of instructions generated by the hook).

For this purpose, this hook should return a Set of RTX that hold the cleared registers, a HARD_REG_SET is not enough.

Since in the ASM_OPERANDS, we will need the RTX for the register (not the REGNO).

Which data structure in GCC should be used here to hold this returned value as Set of RTX ?
> 
> (2) It indicates which registers should be treated as live on return.
> 
> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.

Instead of storing this info in crtl, in my current patch, I added the following in “df-scan.c":
+static HARD_REG_SET zeroed_reg_set;

And routines that manipulate this HARD_REG_SET. 
I think that this should serve the same purpose as storing it to crtl? 

> Then the wrapper around EPILOGUE_USES that we talked about would
> check two things:
> 
> - EPILOGUE_USES itself
> - the crtl HARD_REG_SET
> 
> The crtl set would start out empty and remain empty unless the
> new option is used.

Yes, I did this for zeroed_reg_set in my current patch.
> 
>>        if (zero_rtx[(int)mode] == NULL_RTX)
>>          {
>>            zero_rtx[(int)mode] = reg;
>>            tmp = gen_rtx_SET (reg, const0_rtx);
>>            emit_insn (tmp);
>>          }
>>        else
>>          emit_move_insn (reg, zero_rtx[(int)mode]);
> 
> Hmm, OK, so you're assuming that it's better to zero one register
> and reuse that register for later moves.  I guess this is my RISC
> background/bias showing, but I think it might be simpler to assume
> that zeroing is as cheap as a register move.  The danger with reusing
> earlier registers is that you might introduce a cross-bank move,
> and some targets can only do those via memory.
Okay, I will move zeroes to registers.
> 
> Or perhaps we could use insn_cost to choose between them.  But I think
> the first implementation can just zero each register individually,
> unless we already know of a specific case in which reusing registers
> is necessary.

The current X86 implementation uses register move instead of directly move zero to register, I guess it’s because the register move on X86 is cheaper.
> 
>> I tested this default implementation on aarch64 with a small testing case, -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, however, 
>> -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal compiler error as following:
>> 
>> t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
>>   15 | }
>>      | ^
>> 0xcff58b gen_highpart(machine_mode, rtx_def*)
>> 	../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
>> 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
>> 	../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
>> 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
>> 	../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394
>> 
>> As I studied today, I found the major issue for this bug is because the following statement:
>> 
>>        machine_mode mode = reg_raw_mode[regno];
>> 
>> “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector register on aarch64) , as a result, the zeroing insn for this register is:
>> 
>> (insn 112 111 113 7 (set (reg:TI 32 v0)
>>        (const_int 0 [0])) "t1.c":15:1 -1
>>     (nil))
>> 
>> 
>> However, looks like that the above RTL have to be splitted into two sub register moves on aarch64, and the splitting has some issue. 
>> 
>> So, I guess that on aarch64, zeroing vector registers might need other modes than the one returned by “reg_raw_mode”.  
>> 
>> My questions are:
>> 
>> 1. Is there another available utility routine that returns the proper MODE for the hard registers that can be readily used to zero the hardr register?
>> 2. If not, should I add one more target hook for this purpose? i.e 
>> 
>> /* Return the proper machine mode that can be used to zero this hard register specified by REGNO.  */
>> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
> 
> Thanks for testing aarch64.  I think there are two issues here,
> one in the patch and one in the aarch64 backend:
> 
> - the patch should use emit_move_insn rather than use gen_rtx_SET directly.

Will try this.

> 
> - the aarch64 backend doesn't handle zeroing TImode vector registers,
>  but should.  E.g. for:
> 
>    void
>    foo ()
>    {
>      register __int128_t q0 asm ("q0");
>      q0 = 0;
>      asm volatile ("" :: "w" (q0));
>    }
> 
>  we generate:
> 
>        mov     x0, 0
>        mov     x1, 0
>        fmov    d0, x0
>        fmov    v0.d[1], x1
> 
>  which is, er, somewhat suboptimal.
> 
> I'll try to fix the aarch64 bug for Monday next week.

Thanks a lot.

Qing
> 
> Thanks,
> Richard