Bug 102953 - Improvements to CET-IBT and ENDBR generation
Summary: Improvements to CET-IBT and ENDBR generation
Status: WAITING
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: ---
Assignee: H.J. Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-26 17:01 UTC by Andrew Cooper
Modified: 2022-02-23 20:34 UTC (History)
6 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-10-26 00:00:00


Attachments
Skip ENDBR when emitting direct call/jmp to local function (1.43 KB, patch)
2021-10-26 21:31 UTC, H.J. Lu
Details | Diff
Add -fcf-check-attribute=[yes|no] (1.14 KB, patch)
2021-10-26 22:48 UTC, H.J. Lu
Details | Diff
The v2 patch to skip ENDBR (1.04 KB, patch)
2021-10-27 13:40 UTC, H.J. Lu
Details | Diff
The v2 patch to add -fcf-check-attribute=[yes|no] (1.50 KB, patch)
2021-10-28 01:53 UTC, H.J. Lu
Details | Diff
The v3 patch to add -fcf-check-attribute=[yes|no] (1.81 KB, patch)
2021-10-28 13:17 UTC, H.J. Lu
Details | Diff
The v4 patch to add -fcf-check-attribute=[yes|no] (1.81 KB, patch)
2021-10-28 19:11 UTC, H.J. Lu
Details | Diff
The v5 patch to add -fcf-check-attribute=[yes|no] (1.89 KB, patch)
2021-10-30 00:03 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Cooper 2021-10-26 17:01:20 UTC
Hello,

With CET-IBT, ENDBR{32,64} instructions are used to mark legitimate forward edges for indirect branches.

GCC can generate a CET-IBT binary with -fcf-protection, but the default behaviour is to generate ENDBR instructions for every function.  This creates more "legal" forward edges than necessary in the eyes of CET-IBT.

https://godbolt.org/z/M15rjMb4G is almost excellent, but it would be far more helpful if all functions were implicitly nocf_check, so this example produces a diagnostic.  That way, GCC can point out all functions used by function pointers, rather than the result compiling and failing to be CET-IBT compatible.

This on its own would be enough to let embedded projects minimise their ENDBR* count while having some compiler assistance while doing so.


More generally, a lot of common cases (e.g. Linux) could be computed automatically.  Drivers filling in ops structures typically refer to local symbols, so these defaulting to cf_check would be an improvement still.  This would leave only global symbols needing explicit cf_check.  (Maybe LTO could even figure out the global symbols cases correctly?)

Finally, one minor code generation improvement.  When GCC emits a direct call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation to skip the ENDBR instruction (not needed for direct call/jmp's) and save on decode bandwidth.
Comment 1 H.J. Lu 2021-10-26 20:45:23 UTC
(In reply to Andrew Cooper from comment #0)
> 
> Finally, one minor code generation improvement.  When GCC emits a direct
> call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation
> to skip the ENDBR instruction (not needed for direct call/jmp's) and save on
> decode bandwidth.

It is only safe for calling a static function whose address has been taken.
Comment 2 H.J. Lu 2021-10-26 21:31:45 UTC
Created attachment 51670 [details]
Skip ENDBR when emitting direct call/jmp to local function
Comment 3 H.J. Lu 2021-10-26 22:48:32 UTC
Created attachment 51672 [details]
Add -fcf-check-attribute=[yes|no]

-fcf-check-attribute=[yes|no] implies "cf_check" or "nocf_check"
function attribute.
Comment 4 H.J. Lu 2021-10-26 22:49:20 UTC
Please try these patches and let me know if they work.
Comment 5 peterz 2021-10-27 08:38:55 UTC
(In reply to H.J. Lu from comment #1)
> (In reply to Andrew Cooper from comment #0)
> > 
> > Finally, one minor code generation improvement.  When GCC emits a direct
> > call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation
> > to skip the ENDBR instruction (not needed for direct call/jmp's) and save on
> > decode bandwidth.
> 
> It is only safe for calling a static function whose address has been taken.

Could be done wider using LTO, or pushed into the linker if you're willing to change the ELF format and augment it with STT_FUNC_BTI and R_*_BTI32, there the new relocation would mean +4 (or rather, skip landing pad) when aimed at STT_FUNC_BTI and be identical to _PC32 otherwise.

The ELF thing doesn't help with reducing ENDBR emissions for global symbols since we can't ever tell who will take the address, but it will help with directly calling avoiding the landing pad.

It also allows for a 'pseudo' LTO thing to 'seal' an executable, where it will strip the ENDBRs for all symbols that do not have a _PC32 rela. We can use EXPORT_SYMBOL*() to generate one such on purpose to avoid exported functions from being sealed.

Anyway, I think there's much value in reducing the number of ENDBR instructions as much as possible and we should investigate how the toolchains can best help there.
Comment 6 H.J. Lu 2021-10-27 13:40:55 UTC
Created attachment 51677 [details]
The v2 patch to skip ENDBR
Comment 7 Andrew Cooper 2021-10-27 18:50:42 UTC
Thankyou.  I've tried these two patches and they do appear to be behaving as intended.

I've put together a slightly extended version of the original test.  Compile with gcc -Wall -fno-pic -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no

static void __attribute__((noinline)) a(void)
{
    asm volatile ("sti":::"memory");
}

void __attribute__((nocf_check, noinline)) b(void)
{
    asm volatile ("std":::"memory");
}

void __attribute__((cf_check, noinline)) c(void)
{
    asm volatile ("cmc":::"memory");
}

void (*ptr_a)(void) = a; // Now raises a diagnostic
void (*ptr_b)(void) = b; // Did diagnose previously, still does
void (*ptr_c)(void) = c;

int test(void)
{
    ptr_a();
    ptr_b();
    ptr_c();

    a();
    b();
    c(); // When fully linked, does skip c's ENDBR64 instruction

    return 0;
}

int main(void)
{
    return 0;
}

I'll try applying this to a bigger codebase now.
Comment 8 Andrew Cooper 2021-10-27 23:48:49 UTC
Actually, there is a (possibly pre-existing) diagnostics issue:

$ cat proto.c
static void __attribute__((cf_check)) foo(void);
static void __attribute__((unused)) foo(void)
{
}
void (*ptr)(void) = foo;

$ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no -c proto.c -o proto.o
proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)'
    2 | static void __attribute__((unused)) foo(void)
      |                                     ^~~
proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
    1 | static void __attribute__((cf_check)) foo(void);
      |                                       ^~~


The diagnostic complaining that the forward declaration doesn't match the definition gives 'void(void)' as the type in both cases, leaving out the fact that they differ by cf_check-ness.
Comment 9 H.J. Lu 2021-10-28 01:53:11 UTC
Created attachment 51687 [details]
The v2 patch to add -fcf-check-attribute=[yes|no]
Comment 10 H.J. Lu 2021-10-28 01:53:27 UTC
(In reply to Andrew Cooper from comment #8)
> Actually, there is a (possibly pre-existing) diagnostics issue:
> 
> $ cat proto.c
> static void __attribute__((cf_check)) foo(void);
> static void __attribute__((unused)) foo(void)
> {
> }
> void (*ptr)(void) = foo;
> 
> $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr
> -fcf-check-attribute=no -c proto.c -o proto.o
> proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)'
>     2 | static void __attribute__((unused)) foo(void)
>       |                                     ^~~
> proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
>     1 | static void __attribute__((cf_check)) foo(void);
>       |                                       ^~~
> 
> 
> The diagnostic complaining that the forward declaration doesn't match the
> definition gives 'void(void)' as the type in both cases, leaving out the
> fact that they differ by cf_check-ness.

Please try the v2 patch.
Comment 11 Andrew Cooper 2021-10-28 10:40:53 UTC
(In reply to H.J. Lu from comment #10)
> (In reply to Andrew Cooper from comment #8)
> > Actually, there is a (possibly pre-existing) diagnostics issue:
> > 
> > $ cat proto.c
> > static void __attribute__((cf_check)) foo(void);
> > static void __attribute__((unused)) foo(void)
> > {
> > }
> > void (*ptr)(void) = foo;
> > 
> > $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr
> > -fcf-check-attribute=no -c proto.c -o proto.o
> > proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)'
> >     2 | static void __attribute__((unused)) foo(void)
> >       |                                     ^~~
> > proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
> >     1 | static void __attribute__((cf_check)) foo(void);
> >       |                                       ^~~
> > 
> > 
> > The diagnostic complaining that the forward declaration doesn't match the
> > definition gives 'void(void)' as the type in both cases, leaving out the
> > fact that they differ by cf_check-ness.
> 
> Please try the v2 patch.

I appear to get no diagnostic at all now.  This seems like a regression from v1.

There should be a diagnostic, but it ought to include cf_check in the type it prints.
Comment 12 H.J. Lu 2021-10-28 13:17:10 UTC
Created attachment 51693 [details]
The v3 patch to add -fcf-check-attribute=[yes|no]
Comment 13 H.J. Lu 2021-10-28 13:17:48 UTC
(In reply to Andrew Cooper from comment #11)
> 
> There should be a diagnostic, but it ought to include cf_check in the type
> it prints.

Try the v3 patch.
Comment 14 Andrew Cooper 2021-10-28 17:36:33 UTC
(In reply to H.J. Lu from comment #13)
> (In reply to Andrew Cooper from comment #11)
> > 
> > There should be a diagnostic, but it ought to include cf_check in the type
> > it prints.
> 
> Try the v3 patch.

Thanks.  Now get:

proto.c:2:37: error: conflicting types with implied 'nocf_check' attribute for 'foo'; have 'void(void)'
    2 | static void __attribute__((unused)) foo(void)
      |                                     ^~~
proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
    1 | static void __attribute__((cf_check)) foo(void);
      |                                       ^~~

which at least highlights the issue.  Any variant like this, but possibly even simply reporting 'void __attribute__((nocf_check))(void)' should be fine.
Comment 15 H.J. Lu 2021-10-28 19:11:22 UTC
Created attachment 51696 [details]
The v4 patch to add -fcf-check-attribute=[yes|no]
Comment 16 H.J. Lu 2021-10-28 19:12:16 UTC
(In reply to Andrew Cooper from comment #14)
> (In reply to H.J. Lu from comment #13)
> > (In reply to Andrew Cooper from comment #11)
> > > 
> > > There should be a diagnostic, but it ought to include cf_check in the type
> > > it prints.
> > 
> > Try the v3 patch.
> 
> Thanks.  Now get:
> 
> proto.c:2:37: error: conflicting types with implied 'nocf_check' attribute
> for 'foo'; have 'void(void)'
>     2 | static void __attribute__((unused)) foo(void)
>       |                                     ^~~
> proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
>     1 | static void __attribute__((cf_check)) foo(void);
>       |                                       ^~~
> 
> which at least highlights the issue.  Any variant like this, but possibly
> even simply reporting 'void __attribute__((nocf_check))(void)' should be
> fine.

The v4 patch changed it to

bar1.c:2:37: error: conflicting types for ‘foo’; have ‘void(void)’ with implied ‘nocf_check’ attribute
    2 | static void __attribute__((unused)) foo(void)
      |                                     ^~~
bar1.c:1:39: note: previous declaration of ‘foo’ with type ‘void(void)’
    1 | static void __attribute__((cf_check)) foo(void);
      |                                       ^~~
bar1.c:5:21: warning: initialization of ‘void (*)(void)’ from incompatible pointer type ‘void (__attribute__((nocf_check)) *)(void)’ [-Wincompatible-pointer-types]
    5 | void (*ptr)(void) = foo;
      |                     ^~~
Comment 17 Andrew Cooper 2021-10-29 22:57:28 UTC
I think I've found a bug in the -fcf-check-attribute implementation.

  $ cat fnptr-array-arg.c
  static int __attribute__((cf_check)) foo(char a[], int b)
  {
      return 0;
  }
  int (*ptr)(char[], int) = foo;
  
  $ gcc -Wall -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no -c fnptr-array-arg.c -o tmp.o && objdump -d tmp.o
  fnptr-array-arg.c:5:27: warning: initialization of 'int (*)(char *, int)' from incompatible pointer type 'int (__attribute__((nocf_check)) *)(char *, int)' [-Wincompatible-pointer-types]
      5 | int (*ptr)(char[], int) = foo;
        |                           ^~~
  
  tmp.o:     file format elf64-x86-64
  
  
  Disassembly of section .text:
  
  0000000000000000 <foo>:
     0:	31 c0                	xor    %eax,%eax
     2:	c3                   	retq   


Despite the explicit cf_check, a diagnostic is raised complaining about cf_check-ness of the pointer, and the generated code has no `endbr64` instruction.

This issue only manifests when using array arguments in the function.  When switching `char[]` for `char*`, everything works as expected.  Also, dropping -fcf-check-attribute=no also causes things to work.

  $ gcc -Wall -fcf-protection=branch -mmanual-endbr -c fnptr-array-arg.c -o tmp.o && objdump -d tmp.o

  tmp.o:     file format elf64-x86-64


  Disassembly of section .text:

  0000000000000000 <foo>:
     0:	f3 0f 1e fa          	endbr64 
     4:	31 c0                	xor    %eax,%eax
     6:	c3                   	retq   


Something about the array type seems to cause the explicit cf_check attribute to be lost.
Comment 18 H.J. Lu 2021-10-30 00:03:12 UTC
Created attachment 51701 [details]
The v5 patch to add -fcf-check-attribute=[yes|no]
Comment 19 H.J. Lu 2021-10-30 00:04:22 UTC
(In reply to Andrew Cooper from comment #17)
> I think I've found a bug in the -fcf-check-attribute implementation.
> 

Please try the v5 patch.  BTW, do you have a testcase to show how
-fcf-check-attribute=yes is used?
Comment 20 Andrew Cooper 2021-10-30 00:51:29 UTC
(In reply to H.J. Lu from comment #19)
> (In reply to Andrew Cooper from comment #17)
> > I think I've found a bug in the -fcf-check-attribute implementation.
> 
> Please try the v5 patch.

Thanks.  That does fix the issue.

>  BTW, do you have a testcase to show how
> -fcf-check-attribute=yes is used?

So, this was something I was going to leave until I'd got CET-IBT working, so as to have time to consider all parts before proposing improvements.

I don't have a usecase for -fcf-check-attribute=yes, because it is almost totally redundant with regular -fcf-protection in the first place.

When you are are applying control flow checks, every function is either checked or not checked.  But GCC currently has a 3-way model of {unknown, explicit yes, explicit no} on which it builds its typechecking.

Furthermore, -mmanual-endbr is a gross hack which by default leaves you with a broken binary.

If I were building this from scratch, I'd not have -mmanual-endbr or -fcf-check-attribute at all, because they're exposing complexity which ought not to exist.

I get why the default for -fcf-protection=branch puts an ENDBR* instruction everywhere.  It is the quick, easy and non-invasive way to make libraries compatible with CET, which is a net improvement, even if not ideal.

The ideal way, and definitely future work, is for GCC to calculate the minimum set of required ENDBR*.  At a guess, all non-local symbols (except those LTO can determine are not publicly visible), and any local symbols used by function pointers.

What I'm trying to do is a stopgap in the middle.  No ENDBR*'s by default, but have the compiler tell me where I've got function pointers to a non-ENDBR'd function, so when the result compiles, it stands a reasonable chance of functioning correctly.

Personally, I'd suggest having these as sub-modes of -fcf-protection=branch, instead of exposing all the internals on the command line.
Comment 21 Andrew Cooper 2021-10-30 04:03:15 UTC
Another possibly-bug, but possibly mis-expectations on my behalf.

I've found some code in the depths of Xen which is causing a failure on final link due to a missing `__x86_indirect_thunk_nt_rax` symbol.

  $ cat fnptr-typeof.c
  extern void (*fnptrs[])(char);

  void foo(int a)
  {
      typeof(foo) *bar = (void *)fnptrs[0];
      bar(a);
  }

I realise this  is wildly undefined behaviour, and I will try to address it in due course.  However, the instruction generation is bizarre.

When I compile with -fcf-protection=branch -mmanual-endbr, I get a plain `jmp *fnptrs(%rip)` instruction.  (This is fine.)

When I compile with -fcf-check-attribute=no as well, then I get `notrack jmp *fnptrs(%rip)`.  I'm not sure why the notrack is warranted here; for all GCC knows, the target does have a suitable ENDBR64 instruction.

When I compile with -mindirect-branch=thunk as well, I get a load into %rax and a normal looking retpoline thunk.  (This is as expected too.)

However, when I switch to -mindirect-branch=thunk-extern, I get the the same load into %rax, and then a jump to `__x86_indirect_thunk_nt_rax`.  Presumably the nt is short for notrack.


Irrespective of whether there should be a notrack or not on the jmp form, it weird for the retpoline thunk ABI to be changing based on extern or not.  What is the reasoning behind this?
Comment 22 H.J. Lu 2021-10-30 12:22:32 UTC
(In reply to Andrew Cooper from comment #21)
> Another possibly-bug, but possibly mis-expectations on my behalf.
> 
> I've found some code in the depths of Xen which is causing a failure on
> final link due to a missing `__x86_indirect_thunk_nt_rax` symbol.
> 
>   $ cat fnptr-typeof.c
>   extern void (*fnptrs[])(char);
> 
>   void foo(int a)
>   {
>       typeof(foo) *bar = (void *)fnptrs[0];
>       bar(a);
>   }
> 
> I realise this  is wildly undefined behaviour, and I will try to address it
> in due course.  However, the instruction generation is bizarre.
> 
> When I compile with -fcf-protection=branch -mmanual-endbr, I get a plain
> `jmp *fnptrs(%rip)` instruction.  (This is fine.)
> 
> When I compile with -fcf-check-attribute=no as well, then I get `notrack jmp
> *fnptrs(%rip)`.  I'm not sure why the notrack is warranted here; for all GCC
> knows, the target does have a suitable ENDBR64 instruction.
> 

From "info gcc":

     The 'nocf_check' attribute on a type of pointer to function is used
     to inform the compiler that a call through the pointer should not
     be instrumented when compiled with the '-fcf-protection=branch'
     option.  The compiler assumes that the function's address from the
     pointer is a valid target for a control-flow transfer.  A direct
     function call through a function name is assumed to be a safe call
     thus direct calls are not instrumented by the compiler.

That is why notrack is added.
Comment 23 Andrew Cooper 2021-11-05 11:11:53 UTC
Apologies for the delay, but I do now have a working prototype of Xen with CET-IBT active, using the current version of these patches.

The result actually builds back to older versions of GCCs, but the lack of cf_check-ness typechecking makes this a fragile activity.
Comment 24 H.J. Lu 2021-11-05 14:29:23 UTC
(In reply to Andrew Cooper from comment #23)
> Apologies for the delay, but I do now have a working prototype of Xen with
> CET-IBT active, using the current version of these patches.
> 
> The result actually builds back to older versions of GCCs, but the lack of
> cf_check-ness typechecking makes this a fragile activity.

Should I polish my patches and submit them now?
Comment 25 Andrew Cooper 2022-02-23 20:34:11 UTC
(In reply to H.J. Lu from comment #24)
> (In reply to Andrew Cooper from comment #23)
> > Apologies for the delay, but I do now have a working prototype of Xen with
> > CET-IBT active, using the current version of these patches.
> > 
> > The result actually builds back to older versions of GCCs, but the lack of
> > cf_check-ness typechecking makes this a fragile activity.
> 
> Should I polish my patches and submit them now?

Sorry - I missed this request.  Yes please.  These changes have been used for a while now with no issues identified.