Hello, With CET-IBT, ENDBR{32,64} instructions are used to mark legitimate forward edges for indirect branches. GCC can generate a CET-IBT binary with -fcf-protection, but the default behaviour is to generate ENDBR instructions for every function. This creates more "legal" forward edges than necessary in the eyes of CET-IBT. https://godbolt.org/z/M15rjMb4G is almost excellent, but it would be far more helpful if all functions were implicitly nocf_check, so this example produces a diagnostic. That way, GCC can point out all functions used by function pointers, rather than the result compiling and failing to be CET-IBT compatible. This on its own would be enough to let embedded projects minimise their ENDBR* count while having some compiler assistance while doing so. More generally, a lot of common cases (e.g. Linux) could be computed automatically. Drivers filling in ops structures typically refer to local symbols, so these defaulting to cf_check would be an improvement still. This would leave only global symbols needing explicit cf_check. (Maybe LTO could even figure out the global symbols cases correctly?) Finally, one minor code generation improvement. When GCC emits a direct call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation to skip the ENDBR instruction (not needed for direct call/jmp's) and save on decode bandwidth.
(In reply to Andrew Cooper from comment #0) > > Finally, one minor code generation improvement. When GCC emits a direct > call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation > to skip the ENDBR instruction (not needed for direct call/jmp's) and save on > decode bandwidth. It is only safe for calling a static function whose address has been taken.
Created attachment 51670 [details] Skip ENDBR when emitting direct call/jmp to local function
Created attachment 51672 [details] Add -fcf-check-attribute=[yes|no] -fcf-check-attribute=[yes|no] implies "cf_check" or "nocf_check" function attribute.
Please try these patches and let me know if they work.
(In reply to H.J. Lu from comment #1) > (In reply to Andrew Cooper from comment #0) > > > > Finally, one minor code generation improvement. When GCC emits a direct > > call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation > > to skip the ENDBR instruction (not needed for direct call/jmp's) and save on > > decode bandwidth. > > It is only safe for calling a static function whose address has been taken. Could be done wider using LTO, or pushed into the linker if you're willing to change the ELF format and augment it with STT_FUNC_BTI and R_*_BTI32, there the new relocation would mean +4 (or rather, skip landing pad) when aimed at STT_FUNC_BTI and be identical to _PC32 otherwise. The ELF thing doesn't help with reducing ENDBR emissions for global symbols since we can't ever tell who will take the address, but it will help with directly calling avoiding the landing pad. It also allows for a 'pseudo' LTO thing to 'seal' an executable, where it will strip the ENDBRs for all symbols that do not have a _PC32 rela. We can use EXPORT_SYMBOL*() to generate one such on purpose to avoid exported functions from being sealed. Anyway, I think there's much value in reducing the number of ENDBR instructions as much as possible and we should investigate how the toolchains can best help there.
Created attachment 51677 [details] The v2 patch to skip ENDBR
Thankyou. I've tried these two patches and they do appear to be behaving as intended. I've put together a slightly extended version of the original test. Compile with gcc -Wall -fno-pic -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no static void __attribute__((noinline)) a(void) { asm volatile ("sti":::"memory"); } void __attribute__((nocf_check, noinline)) b(void) { asm volatile ("std":::"memory"); } void __attribute__((cf_check, noinline)) c(void) { asm volatile ("cmc":::"memory"); } void (*ptr_a)(void) = a; // Now raises a diagnostic void (*ptr_b)(void) = b; // Did diagnose previously, still does void (*ptr_c)(void) = c; int test(void) { ptr_a(); ptr_b(); ptr_c(); a(); b(); c(); // When fully linked, does skip c's ENDBR64 instruction return 0; } int main(void) { return 0; } I'll try applying this to a bigger codebase now.
Actually, there is a (possibly pre-existing) diagnostics issue: $ cat proto.c static void __attribute__((cf_check)) foo(void); static void __attribute__((unused)) foo(void) { } void (*ptr)(void) = foo; $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no -c proto.c -o proto.o proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)' 2 | static void __attribute__((unused)) foo(void) | ^~~ proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' 1 | static void __attribute__((cf_check)) foo(void); | ^~~ The diagnostic complaining that the forward declaration doesn't match the definition gives 'void(void)' as the type in both cases, leaving out the fact that they differ by cf_check-ness.
Created attachment 51687 [details] The v2 patch to add -fcf-check-attribute=[yes|no]
(In reply to Andrew Cooper from comment #8) > Actually, there is a (possibly pre-existing) diagnostics issue: > > $ cat proto.c > static void __attribute__((cf_check)) foo(void); > static void __attribute__((unused)) foo(void) > { > } > void (*ptr)(void) = foo; > > $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr > -fcf-check-attribute=no -c proto.c -o proto.o > proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)' > 2 | static void __attribute__((unused)) foo(void) > | ^~~ > proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' > 1 | static void __attribute__((cf_check)) foo(void); > | ^~~ > > > The diagnostic complaining that the forward declaration doesn't match the > definition gives 'void(void)' as the type in both cases, leaving out the > fact that they differ by cf_check-ness. Please try the v2 patch.
(In reply to H.J. Lu from comment #10) > (In reply to Andrew Cooper from comment #8) > > Actually, there is a (possibly pre-existing) diagnostics issue: > > > > $ cat proto.c > > static void __attribute__((cf_check)) foo(void); > > static void __attribute__((unused)) foo(void) > > { > > } > > void (*ptr)(void) = foo; > > > > $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr > > -fcf-check-attribute=no -c proto.c -o proto.o > > proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)' > > 2 | static void __attribute__((unused)) foo(void) > > | ^~~ > > proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' > > 1 | static void __attribute__((cf_check)) foo(void); > > | ^~~ > > > > > > The diagnostic complaining that the forward declaration doesn't match the > > definition gives 'void(void)' as the type in both cases, leaving out the > > fact that they differ by cf_check-ness. > > Please try the v2 patch. I appear to get no diagnostic at all now. This seems like a regression from v1. There should be a diagnostic, but it ought to include cf_check in the type it prints.
Created attachment 51693 [details] The v3 patch to add -fcf-check-attribute=[yes|no]
(In reply to Andrew Cooper from comment #11) > > There should be a diagnostic, but it ought to include cf_check in the type > it prints. Try the v3 patch.
(In reply to H.J. Lu from comment #13) > (In reply to Andrew Cooper from comment #11) > > > > There should be a diagnostic, but it ought to include cf_check in the type > > it prints. > > Try the v3 patch. Thanks. Now get: proto.c:2:37: error: conflicting types with implied 'nocf_check' attribute for 'foo'; have 'void(void)' 2 | static void __attribute__((unused)) foo(void) | ^~~ proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' 1 | static void __attribute__((cf_check)) foo(void); | ^~~ which at least highlights the issue. Any variant like this, but possibly even simply reporting 'void __attribute__((nocf_check))(void)' should be fine.
Created attachment 51696 [details] The v4 patch to add -fcf-check-attribute=[yes|no]
(In reply to Andrew Cooper from comment #14) > (In reply to H.J. Lu from comment #13) > > (In reply to Andrew Cooper from comment #11) > > > > > > There should be a diagnostic, but it ought to include cf_check in the type > > > it prints. > > > > Try the v3 patch. > > Thanks. Now get: > > proto.c:2:37: error: conflicting types with implied 'nocf_check' attribute > for 'foo'; have 'void(void)' > 2 | static void __attribute__((unused)) foo(void) > | ^~~ > proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' > 1 | static void __attribute__((cf_check)) foo(void); > | ^~~ > > which at least highlights the issue. Any variant like this, but possibly > even simply reporting 'void __attribute__((nocf_check))(void)' should be > fine. The v4 patch changed it to bar1.c:2:37: error: conflicting types for ‘foo’; have ‘void(void)’ with implied ‘nocf_check’ attribute 2 | static void __attribute__((unused)) foo(void) | ^~~ bar1.c:1:39: note: previous declaration of ‘foo’ with type ‘void(void)’ 1 | static void __attribute__((cf_check)) foo(void); | ^~~ bar1.c:5:21: warning: initialization of ‘void (*)(void)’ from incompatible pointer type ‘void (__attribute__((nocf_check)) *)(void)’ [-Wincompatible-pointer-types] 5 | void (*ptr)(void) = foo; | ^~~
I think I've found a bug in the -fcf-check-attribute implementation. $ cat fnptr-array-arg.c static int __attribute__((cf_check)) foo(char a[], int b) { return 0; } int (*ptr)(char[], int) = foo; $ gcc -Wall -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no -c fnptr-array-arg.c -o tmp.o && objdump -d tmp.o fnptr-array-arg.c:5:27: warning: initialization of 'int (*)(char *, int)' from incompatible pointer type 'int (__attribute__((nocf_check)) *)(char *, int)' [-Wincompatible-pointer-types] 5 | int (*ptr)(char[], int) = foo; | ^~~ tmp.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 31 c0 xor %eax,%eax 2: c3 retq Despite the explicit cf_check, a diagnostic is raised complaining about cf_check-ness of the pointer, and the generated code has no `endbr64` instruction. This issue only manifests when using array arguments in the function. When switching `char[]` for `char*`, everything works as expected. Also, dropping -fcf-check-attribute=no also causes things to work. $ gcc -Wall -fcf-protection=branch -mmanual-endbr -c fnptr-array-arg.c -o tmp.o && objdump -d tmp.o tmp.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: f3 0f 1e fa endbr64 4: 31 c0 xor %eax,%eax 6: c3 retq Something about the array type seems to cause the explicit cf_check attribute to be lost.
Created attachment 51701 [details] The v5 patch to add -fcf-check-attribute=[yes|no]
(In reply to Andrew Cooper from comment #17) > I think I've found a bug in the -fcf-check-attribute implementation. > Please try the v5 patch. BTW, do you have a testcase to show how -fcf-check-attribute=yes is used?
(In reply to H.J. Lu from comment #19) > (In reply to Andrew Cooper from comment #17) > > I think I've found a bug in the -fcf-check-attribute implementation. > > Please try the v5 patch. Thanks. That does fix the issue. > BTW, do you have a testcase to show how > -fcf-check-attribute=yes is used? So, this was something I was going to leave until I'd got CET-IBT working, so as to have time to consider all parts before proposing improvements. I don't have a usecase for -fcf-check-attribute=yes, because it is almost totally redundant with regular -fcf-protection in the first place. When you are are applying control flow checks, every function is either checked or not checked. But GCC currently has a 3-way model of {unknown, explicit yes, explicit no} on which it builds its typechecking. Furthermore, -mmanual-endbr is a gross hack which by default leaves you with a broken binary. If I were building this from scratch, I'd not have -mmanual-endbr or -fcf-check-attribute at all, because they're exposing complexity which ought not to exist. I get why the default for -fcf-protection=branch puts an ENDBR* instruction everywhere. It is the quick, easy and non-invasive way to make libraries compatible with CET, which is a net improvement, even if not ideal. The ideal way, and definitely future work, is for GCC to calculate the minimum set of required ENDBR*. At a guess, all non-local symbols (except those LTO can determine are not publicly visible), and any local symbols used by function pointers. What I'm trying to do is a stopgap in the middle. No ENDBR*'s by default, but have the compiler tell me where I've got function pointers to a non-ENDBR'd function, so when the result compiles, it stands a reasonable chance of functioning correctly. Personally, I'd suggest having these as sub-modes of -fcf-protection=branch, instead of exposing all the internals on the command line.
Another possibly-bug, but possibly mis-expectations on my behalf. I've found some code in the depths of Xen which is causing a failure on final link due to a missing `__x86_indirect_thunk_nt_rax` symbol. $ cat fnptr-typeof.c extern void (*fnptrs[])(char); void foo(int a) { typeof(foo) *bar = (void *)fnptrs[0]; bar(a); } I realise this is wildly undefined behaviour, and I will try to address it in due course. However, the instruction generation is bizarre. When I compile with -fcf-protection=branch -mmanual-endbr, I get a plain `jmp *fnptrs(%rip)` instruction. (This is fine.) When I compile with -fcf-check-attribute=no as well, then I get `notrack jmp *fnptrs(%rip)`. I'm not sure why the notrack is warranted here; for all GCC knows, the target does have a suitable ENDBR64 instruction. When I compile with -mindirect-branch=thunk as well, I get a load into %rax and a normal looking retpoline thunk. (This is as expected too.) However, when I switch to -mindirect-branch=thunk-extern, I get the the same load into %rax, and then a jump to `__x86_indirect_thunk_nt_rax`. Presumably the nt is short for notrack. Irrespective of whether there should be a notrack or not on the jmp form, it weird for the retpoline thunk ABI to be changing based on extern or not. What is the reasoning behind this?
(In reply to Andrew Cooper from comment #21) > Another possibly-bug, but possibly mis-expectations on my behalf. > > I've found some code in the depths of Xen which is causing a failure on > final link due to a missing `__x86_indirect_thunk_nt_rax` symbol. > > $ cat fnptr-typeof.c > extern void (*fnptrs[])(char); > > void foo(int a) > { > typeof(foo) *bar = (void *)fnptrs[0]; > bar(a); > } > > I realise this is wildly undefined behaviour, and I will try to address it > in due course. However, the instruction generation is bizarre. > > When I compile with -fcf-protection=branch -mmanual-endbr, I get a plain > `jmp *fnptrs(%rip)` instruction. (This is fine.) > > When I compile with -fcf-check-attribute=no as well, then I get `notrack jmp > *fnptrs(%rip)`. I'm not sure why the notrack is warranted here; for all GCC > knows, the target does have a suitable ENDBR64 instruction. > From "info gcc": The 'nocf_check' attribute on a type of pointer to function is used to inform the compiler that a call through the pointer should not be instrumented when compiled with the '-fcf-protection=branch' option. The compiler assumes that the function's address from the pointer is a valid target for a control-flow transfer. A direct function call through a function name is assumed to be a safe call thus direct calls are not instrumented by the compiler. That is why notrack is added.
Apologies for the delay, but I do now have a working prototype of Xen with CET-IBT active, using the current version of these patches. The result actually builds back to older versions of GCCs, but the lack of cf_check-ness typechecking makes this a fragile activity.
(In reply to Andrew Cooper from comment #23) > Apologies for the delay, but I do now have a working prototype of Xen with > CET-IBT active, using the current version of these patches. > > The result actually builds back to older versions of GCCs, but the lack of > cf_check-ness typechecking makes this a fragile activity. Should I polish my patches and submit them now?
(In reply to H.J. Lu from comment #24) > (In reply to Andrew Cooper from comment #23) > > Apologies for the delay, but I do now have a working prototype of Xen with > > CET-IBT active, using the current version of these patches. > > > > The result actually builds back to older versions of GCCs, but the lack of > > cf_check-ness typechecking makes this a fragile activity. > > Should I polish my patches and submit them now? Sorry - I missed this request. Yes please. These changes have been used for a while now with no issues identified.