106671 – aarch64: BTI instruction are not inserted for cross-section direct calls

Bug 106671 - aarch64: BTI instruction are not inserted for cross-section direct calls

Summary: aarch64: BTI instruction are not inserted for cross-section direct calls

Status:	WAITING

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	12.1.1

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2022-08-17 22:51 UTC by D Scott Phillips
Modified:	2024-07-16 13:29 UTC (History)
CC List:	11 users (show)

See Also:	https://sourceware.org/bugzilla/show_bug.cgi?id=30076
Host:
Target:	aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed:	2022-08-17 00:00:00

Attachments
[PATCH] aarch64: Add a BTI test for cross-section calls (675 bytes, application/mbox) 2022-08-17 22:51 UTC, D Scott Phillips	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description D Scott Phillips 2022-08-17 22:51:46 UTC

Created attachment 53469 [details]
[PATCH] aarch64: Add a BTI test for cross-section calls

Direct calls to functions in other sections do not cause `bti c` instructions to be added to the callee. During linking, if the sections are farther apart than a direct branch can reach, a trampoline indirect branch sequence may be added. Because the callee does not have a `bti c` instruction, the call will result in a Branch Target exception.

With the attached test case on `cc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1)`, the code compiles to (trimmed):

        .text
func:
        mov     w0, 37
        ret

        .section        .main.text,"ax",@progbits
main:
        hint    25 // paciasp
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      func
        cmp     w0, 37
        cset    w0, ne
        ldp     x29, x30, [sp], 16
        hint    29 // autiasp
        ret

And then linking yields:

0000000000040118 <func>:
   40118:       528004a0        mov     w0, #0x25                       // #37
   4011c:       d65f03c0        ret

0000000010040000 <main>:
    10040000:   d503233f        paciasp
    10040004:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
    10040008:   910003fd        mov     x29, sp
    1004000c:   94000009        bl      10040030 <___veneer>
    10040010:   7100941f        cmp     w0, #0x25
    10040014:   1a9f07e0        cset    w0, ne  // ne = any
    10040018:   a8c17bfd        ldp     x29, x30, [sp], #16
    1004001c:   d50323bf        autiasp
    10040020:   d65f03c0        ret

0000000010040030 <___veneer>:
    10040030:   90f80010        adrp    x16, 40000 <_start>
    10040034:   91046210        add     x16, x16, #0x118
    10040038:   d61f0200        br      x16

Finally, I've made this little test case, but the real case where I'm running into this is in the Linux kernel, where the two sections are .init.text and .text, and the equivalent of a ___veneer thing is in the module plt.

Comment 1 Andrew Pinski 2022-08-17 23:35:37 UTC

Shouldn't the linker add the BTI inside the ___veneer instead?

Comment 2 D Scott Phillips 2022-08-17 23:39:03 UTC

th(In reply to Andrew Pinski from comment #1)
> Shouldn't the linker add the BTI inside the ___veneer instead?

The bti instruction has to be placed at the target of the indirect branch (at the top of `func` in this case) so I don't think it would be possible to work around this just within the veneer.

Comment 3 Andrew Pinski 2022-08-17 23:46:49 UTC

Basically:
void
aarch64_print_patchable_function_entry (FILE *file,
                                        unsigned HOST_WIDE_INT patch_area_size,
                                        bool record_p)
{
  if (cfun->machine->label_is_assembled
      && aarch64_bti_enabled ()
      && !cgraph_node::get (cfun->decl)->only_called_directly_p ())


That last check just needs to be removed as there is no way to know if the linker will output a veneer.

Comment 4 Andrew Pinski 2022-08-17 23:57:00 UTC

(In reply to Andrew Pinski from comment #3)
> Basically:
> void
> aarch64_print_patchable_function_entry (FILE *file,
>                                         unsigned HOST_WIDE_INT
> patch_area_size,
>                                         bool record_p)
> {
>   if (cfun->machine->label_is_assembled
>       && aarch64_bti_enabled ()
>       && !cgraph_node::get (cfun->decl)->only_called_directly_p ())
> 
> 
> That last check just needs to be removed as there is no way to know if the
> linker will output a veneer.

That only fixes the -fpatchable-function-entry= case.

aarch64-bti-insert.cc needs to be fixed too:
  /* Since a Branch Target Exception can only be triggered by an indirect call,
     we exempt function that are only called directly.  We also exempt
     functions that are already protected by Return Address Signing (PACIASP/
     PACIBSP).  For all other cases insert a BTI C at the beginning of the
     function.  */
  if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())

Comment 5 Richard Earnshaw 2022-08-18 10:10:09 UTC

(In reply to D Scott Phillips from comment #2)
> th(In reply to Andrew Pinski from comment #1)
> > Shouldn't the linker add the BTI inside the ___veneer instead?
> 
> The bti instruction has to be placed at the target of the indirect branch
> (at the top of `func` in this case) so I don't think it would be possible to
> work around this just within the veneer.

The veneer has to be placed 'near' the target and then end with a direct branch instruction.  The linker should be able to work this out.

Comment 6 Richard Earnshaw 2022-08-18 10:13:35 UTC

(In reply to Richard Earnshaw from comment #5)
> (In reply to D Scott Phillips from comment #2)
> > th(In reply to Andrew Pinski from comment #1)
> > > Shouldn't the linker add the BTI inside the ___veneer instead?
> > 
> > The bti instruction has to be placed at the target of the indirect branch
> > (at the top of `func` in this case) so I don't think it would be possible to
> > work around this just within the veneer.
> 
> The veneer has to be placed 'near' the target and then end with a direct
> branch instruction.  The linker should be able to work this out.

This might, of course, mean that two veneers are needed in this case, one that can be reached from the initial branch, and one that can reach the final target.  A direct branch will jump to the first and the second one will be reached by an indirect jump (needing a BTI at the start).

Comment 7 nsz 2023-03-23 13:06:18 UTC

fixed in bfd ld 2.41 see
https://sourceware.org/bugzilla/show_bug.cgi?id=30076

we can also fix gcc to work with older ld (emit bti c in local functions), but i don't plan to do that unless there is a reason to do so. (it increases the emitted bti c considerably in some workloads, e.g. linux kernel, while the linker fix is less intrusive in the common case with small binaries and no weird section hacks).

Comment 8 Mark Brown 2023-03-23 13:47:03 UTC

Note that the issue was found in the Linux kernel - we were expecting to see the BTI Cs there, it's certainly a lot simpler to work with.

Comment 9 Feng Xue 2023-08-02 16:03:46 UTC

On some occasions, we may not use the new ld, the kernel-building relies on its own runtime linker which is used for kernel modules. So I created a patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626084.html), and this provides user another option that could be done at the compiler side.

Comment 10 Wilco 2023-08-11 09:28:28 UTC

(In reply to Feng Xue from comment #9)
> On some occasions, we may not use the new ld, the kernel-building relies on
> its own runtime linker which is used for kernel modules. So I created a
> patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626084.html),
> and this provides user another option that could be done at the compiler
> side.

Reducing BTI is important for security. With LTO a binary should only have BTI on functions that are indirectly called. So I don't like the idea of adding more BTI with a new option - it means we will need a linker optimization to remove those redundant BTIs (eg. by changing them into NOPs).

Note that branch offsets up to 256MB don't need special veneer handling: one should place a direct branch about halfway to the destination.

Does Linux do any weird hacks in -fpatchable-function-entry that makes it hard to use BTI?

Comment 11 Jiangning Liu 2023-08-14 20:25:43 UTC

Hi Wilco,

> "it means we will need a linker optimization to remove those redundant BTIs (eg. by changing them into NOPs)"

It will be only for performance optimization, right? If we don't care about performance, the linker doesn't need to optimize it to be NOP, right? It could still be useful if we only do this operation for a specific module.

Thanks,
-Jiangning

Comment 12 nsz 2023-08-15 10:11:53 UTC

(In reply to Jiangning Liu from comment #11)
> Hi Wilco,
> 
> > "it means we will need a linker optimization to remove those redundant BTIs (eg. by changing them into NOPs)"
> 
> It will be only for performance optimization, right? If we don't care about
> performance, the linker doesn't need to optimize it to be NOP, right? It
> could still be useful if we only do this operation for a specific module.

no, this is a security feature, we want as few BTI c in an executable
segment as possible.

Comment 13 Mark Brown 2023-08-15 19:26:39 UTC

The kernel hasn't got any problem with BTI as far as I am aware - when built with clang we run the kernel with BTI enabled since clang does just insert a BTI C at the start of every function, and GCC works fine so long as we don't get any out of range jumps being generated. The issue is that we don't have anything to insert veneers in the case where section placement puts static functions into a distant enough part of memory to need an indirect jump but GCC has decided to omit the landing pad.

Comment 14 Richard Earnshaw 2023-08-21 09:53:00 UTC

(In reply to Mark Brown from comment #13)
> The kernel hasn't got any problem with BTI as far as I am aware - when built
> with clang we run the kernel with BTI enabled since clang does just insert a
> BTI C at the start of every function, and GCC works fine so long as we don't
> get any out of range jumps being generated. The issue is that we don't have
> anything to insert veneers in the case where section placement puts static
> functions into a distant enough part of memory to need an indirect jump but
> GCC has decided to omit the landing pad.

The linker has to insert the veneers.

Comment 15 Mark Brown 2023-08-21 14:15:26 UTC

The kernel module loader simply does not insert veneers at present, and there were some implementation concerns IIRC.

Comment 16 Richard Earnshaw 2023-08-21 16:04:15 UTC

(In reply to Mark Brown from comment #15)
> The kernel module loader simply does not insert veneers at present, and
> there were some implementation concerns IIRC.

That's not a good reason to weaken the security of the generated code.

Comment 17 Wilco 2023-08-21 16:18:24 UTC

(In reply to Mark Brown from comment #13)
> The kernel hasn't got any problem with BTI as far as I am aware - when built
> with clang we run the kernel with BTI enabled since clang does just insert a
> BTI C at the start of every function, and GCC works fine so long as we don't
> get any out of range jumps being generated. The issue is that we don't have
> anything to insert veneers in the case where section placement puts static
> functions into a distant enough part of memory to need an indirect jump but
> GCC has decided to omit the landing pad.

Is the kernel already larger than 128 MBytes .text? Or do people do weird stuff with section placement that causes branches to be out of range?

Comment 18 Mark Brown 2023-08-21 16:24:02 UTC

It's section placement stuff that's triggering this. You will also be able to build a larger kernel if you try, though I'm not sure that's practical.

Comment 19 jvoisin 2024-07-16 13:29:53 UTC

> That's not a good reason to weaken the security of the generated code.

Having BTI will more valid targets is still better than no BTI at all, and it would still be better than what clang is doing.