Created attachment 53469 [details] [PATCH] aarch64: Add a BTI test for cross-section calls Direct calls to functions in other sections do not cause `bti c` instructions to be added to the callee. During linking, if the sections are farther apart than a direct branch can reach, a trampoline indirect branch sequence may be added. Because the callee does not have a `bti c` instruction, the call will result in a Branch Target exception. With the attached test case on `cc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1)`, the code compiles to (trimmed): .text func: mov w0, 37 ret .section .main.text,"ax",@progbits main: hint 25 // paciasp stp x29, x30, [sp, -16]! mov x29, sp bl func cmp w0, 37 cset w0, ne ldp x29, x30, [sp], 16 hint 29 // autiasp ret And then linking yields: 0000000000040118 <func>: 40118: 528004a0 mov w0, #0x25 // #37 4011c: d65f03c0 ret 0000000010040000 <main>: 10040000: d503233f paciasp 10040004: a9bf7bfd stp x29, x30, [sp, #-16]! 10040008: 910003fd mov x29, sp 1004000c: 94000009 bl 10040030 <___veneer> 10040010: 7100941f cmp w0, #0x25 10040014: 1a9f07e0 cset w0, ne // ne = any 10040018: a8c17bfd ldp x29, x30, [sp], #16 1004001c: d50323bf autiasp 10040020: d65f03c0 ret 0000000010040030 <___veneer>: 10040030: 90f80010 adrp x16, 40000 <_start> 10040034: 91046210 add x16, x16, #0x118 10040038: d61f0200 br x16 Finally, I've made this little test case, but the real case where I'm running into this is in the Linux kernel, where the two sections are .init.text and .text, and the equivalent of a ___veneer thing is in the module plt.
Shouldn't the linker add the BTI inside the ___veneer instead?
th(In reply to Andrew Pinski from comment #1) > Shouldn't the linker add the BTI inside the ___veneer instead? The bti instruction has to be placed at the target of the indirect branch (at the top of `func` in this case) so I don't think it would be possible to work around this just within the veneer.
Basically: void aarch64_print_patchable_function_entry (FILE *file, unsigned HOST_WIDE_INT patch_area_size, bool record_p) { if (cfun->machine->label_is_assembled && aarch64_bti_enabled () && !cgraph_node::get (cfun->decl)->only_called_directly_p ()) That last check just needs to be removed as there is no way to know if the linker will output a veneer.
(In reply to Andrew Pinski from comment #3) > Basically: > void > aarch64_print_patchable_function_entry (FILE *file, > unsigned HOST_WIDE_INT > patch_area_size, > bool record_p) > { > if (cfun->machine->label_is_assembled > && aarch64_bti_enabled () > && !cgraph_node::get (cfun->decl)->only_called_directly_p ()) > > > That last check just needs to be removed as there is no way to know if the > linker will output a veneer. That only fixes the -fpatchable-function-entry= case. aarch64-bti-insert.cc needs to be fixed too: /* Since a Branch Target Exception can only be triggered by an indirect call, we exempt function that are only called directly. We also exempt functions that are already protected by Return Address Signing (PACIASP/ PACIBSP). For all other cases insert a BTI C at the beginning of the function. */ if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())
(In reply to D Scott Phillips from comment #2) > th(In reply to Andrew Pinski from comment #1) > > Shouldn't the linker add the BTI inside the ___veneer instead? > > The bti instruction has to be placed at the target of the indirect branch > (at the top of `func` in this case) so I don't think it would be possible to > work around this just within the veneer. The veneer has to be placed 'near' the target and then end with a direct branch instruction. The linker should be able to work this out.
(In reply to Richard Earnshaw from comment #5) > (In reply to D Scott Phillips from comment #2) > > th(In reply to Andrew Pinski from comment #1) > > > Shouldn't the linker add the BTI inside the ___veneer instead? > > > > The bti instruction has to be placed at the target of the indirect branch > > (at the top of `func` in this case) so I don't think it would be possible to > > work around this just within the veneer. > > The veneer has to be placed 'near' the target and then end with a direct > branch instruction. The linker should be able to work this out. This might, of course, mean that two veneers are needed in this case, one that can be reached from the initial branch, and one that can reach the final target. A direct branch will jump to the first and the second one will be reached by an indirect jump (needing a BTI at the start).
fixed in bfd ld 2.41 see https://sourceware.org/bugzilla/show_bug.cgi?id=30076 we can also fix gcc to work with older ld (emit bti c in local functions), but i don't plan to do that unless there is a reason to do so. (it increases the emitted bti c considerably in some workloads, e.g. linux kernel, while the linker fix is less intrusive in the common case with small binaries and no weird section hacks).
Note that the issue was found in the Linux kernel - we were expecting to see the BTI Cs there, it's certainly a lot simpler to work with.
On some occasions, we may not use the new ld, the kernel-building relies on its own runtime linker which is used for kernel modules. So I created a patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626084.html), and this provides user another option that could be done at the compiler side.
(In reply to Feng Xue from comment #9) > On some occasions, we may not use the new ld, the kernel-building relies on > its own runtime linker which is used for kernel modules. So I created a > patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626084.html), > and this provides user another option that could be done at the compiler > side. Reducing BTI is important for security. With LTO a binary should only have BTI on functions that are indirectly called. So I don't like the idea of adding more BTI with a new option - it means we will need a linker optimization to remove those redundant BTIs (eg. by changing them into NOPs). Note that branch offsets up to 256MB don't need special veneer handling: one should place a direct branch about halfway to the destination. Does Linux do any weird hacks in -fpatchable-function-entry that makes it hard to use BTI?
Hi Wilco, > "it means we will need a linker optimization to remove those redundant BTIs (eg. by changing them into NOPs)" It will be only for performance optimization, right? If we don't care about performance, the linker doesn't need to optimize it to be NOP, right? It could still be useful if we only do this operation for a specific module. Thanks, -Jiangning
(In reply to Jiangning Liu from comment #11) > Hi Wilco, > > > "it means we will need a linker optimization to remove those redundant BTIs (eg. by changing them into NOPs)" > > It will be only for performance optimization, right? If we don't care about > performance, the linker doesn't need to optimize it to be NOP, right? It > could still be useful if we only do this operation for a specific module. no, this is a security feature, we want as few BTI c in an executable segment as possible.
The kernel hasn't got any problem with BTI as far as I am aware - when built with clang we run the kernel with BTI enabled since clang does just insert a BTI C at the start of every function, and GCC works fine so long as we don't get any out of range jumps being generated. The issue is that we don't have anything to insert veneers in the case where section placement puts static functions into a distant enough part of memory to need an indirect jump but GCC has decided to omit the landing pad.
(In reply to Mark Brown from comment #13) > The kernel hasn't got any problem with BTI as far as I am aware - when built > with clang we run the kernel with BTI enabled since clang does just insert a > BTI C at the start of every function, and GCC works fine so long as we don't > get any out of range jumps being generated. The issue is that we don't have > anything to insert veneers in the case where section placement puts static > functions into a distant enough part of memory to need an indirect jump but > GCC has decided to omit the landing pad. The linker has to insert the veneers.
The kernel module loader simply does not insert veneers at present, and there were some implementation concerns IIRC.
(In reply to Mark Brown from comment #15) > The kernel module loader simply does not insert veneers at present, and > there were some implementation concerns IIRC. That's not a good reason to weaken the security of the generated code.
(In reply to Mark Brown from comment #13) > The kernel hasn't got any problem with BTI as far as I am aware - when built > with clang we run the kernel with BTI enabled since clang does just insert a > BTI C at the start of every function, and GCC works fine so long as we don't > get any out of range jumps being generated. The issue is that we don't have > anything to insert veneers in the case where section placement puts static > functions into a distant enough part of memory to need an indirect jump but > GCC has decided to omit the landing pad. Is the kernel already larger than 128 MBytes .text? Or do people do weird stuff with section placement that causes branches to be out of range?
It's section placement stuff that's triggering this. You will also be able to build a larger kernel if you try, though I'm not sure that's practical.