[PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

Wed Jan 26 06:51:11 GMT 2022

On 1/25/22 02:19, Richard Sandiford wrote:
> Dan Li <ashimida@linux.alibaba.com> writes:
>>>> +
>>>>       if (flag_stack_usage_info)
>>>>         current_function_static_stack_size = constant_lower_bound (frame_size);
>>>>     
>>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>>>           RTX_FRAME_RELATED_P (insn) = 1;
>>>>         }
>>>>     
>>>> +  /* Pop return address from shadow call stack.  */
>>>> +  if (aarch64_shadow_call_stack_enabled ())
>>>> +	emit_insn (gen_scs_pop ());
>>>> +
>>>
>>> This looks correct, but following on from the above, I guess we continue
>>> to restore x30 from the frame in the traditional way first, and then
>>> overwrite it with the SCS value.  I think the output code would be
>>> slightly neater if we suppressed the first restore of x30.
>>>
>> Yes, the current epilogue is like:
>>       .......
>>       ldp     x29, x30, [sp], #16
>> +   ldr     x30, [x18, #-8]!
>>
>> I think may be we can keep the two x30 pops here, for two reasons:
>> 1) The implementation of scs in clang is to pop x30 twice, it might be
>> better to be consistent with clang here[1].
> 
> This doesn't seem a very compelling reason on its own, unless it's
> explicitly meant to be observable behaviour.  (But then, observed how?)
> 

Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that
this behavior must be followed.

BTW:
Do we still need to emit the .cfi_restore 30 directive here if we
want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)

Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
the generated assembly code may be as follows:

str     x30, [x18], 8
ldp     x29, x30, [sp], 16
......
ldr     x29, [sp], 16
                         ## Do we still need a .cfi_restore 30 here
.cfi_restore 29
.cfi_def_cfa_offset 0
ldr     x30, [x18, -8]!
                         ## Or may be a non-CFA based directive here
ret

>> 2) If we keep the first restore of x30, it may provide some flexibility
>> for other scenarios. For example, we can directly patch the scs_push/pop
>> insns in the binary to turn it into a binary without scs;
> 
> Is that a supported (and ideally documented) use case?  If it is,
> then it becomes important for correctness that the compiler emits
> exactly the opcodes in the original patch.  If it isn't, then the
> compiler can emit other instructions that have the same effect.
> 

Oh, no, this is just a little trick that can be used for compatibility.
(Maybe some scenarios such as our company sometimes need to be
compatible with some non-open source binaries from different
manufacturers, two pops could make life easier :). )

If this is not a consideration for our community, please ignore
this request.

> I'd like a definitive ruling on this from the kernel folks before
> the patch goes in.
> 

Ok, I'll cc some kernel folks to make sure I didn't miss something.

> If binary patching is supposed to be possible then scs_push and
> scs_pop *do* need to be separate define_insns.  But they also need
> to have some magic unspec that differentiates them from normal
> pushes and pops, e.g.:
> 
>    (set ...mem...
>         (unspec:DI [...reg...] UNSPEC_SCS_PUSH))
> 
> so that there is no chance that the pattern would be treated as
> a normal move and optimised accordingly.
> 

Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan