Hello [FYI, this is being cross-requested of Clang too] Linux and other kernel level software makes use of -mindirect-branch=thunk-extern to be able to alter the handling of indirect branches at boot. It turns out to be advantageous to inline the thunks when retpoline is not in use. https://lore.kernel.org/lkml/20211026120132.613201817@infradead.org/ is some infrastructure to make this work. In some cases, we want to be able to inline an `lfence; jmp *%reg` thunk. This is fine for the low 8 registers, but not fine for %r{8..15} where the REX prefix pushes the replacement size to being 6 bytes. It would be very useful to have a code-gen option to write out `call %cs:__x86_indirect_thunk_r{8..15}` where the redundant %cs prefix will increase the instruction length to 6, allowing the non-retpoline form to be inlined. Relatedly, x86 straight line speculation has been discussed before, but without any action taken. It would be helpful to have a code gen option which would emit `int3` following any `ret` instruction, and any indirect jump, as neither of these two cases have following architectural execution. The reason these two are related is that if both options are in use, we want an extra byte of replacement space to be able to inline `lfence; jmp *%reg; int3`. Third (and possibly only for future optimisations), Clang has been observed to spot conditional tail calls as `Jcc __x86_indirect_thunk_*`. This is a 6 byte source size, but needs up to 9 bytes of space for inlining including an `int3` for straight line speculation reasons (See https://lore.kernel.org/lkml/20211026120310.359986601@infradead.org/ for full details). It might be enough to simply prohibit an optimisation like this when trying to pad retpolines for inlineability.
https://bugs.llvm.org/show_bug.cgi?id=52323 for Clang cross-request.
PeterZ has suggested that the straight line speculation case can be dis-entangled with the thunk inlining case. If an `int3` is emitted following any `jmp __x86_indirect_thunk_*` instruction (i.e. treated as an indirect jump despite retpoline), then the inlining logic need not worry about straight line speculation at all. However, this does depend on not generating `jcc __x86_indirect_thunk_*` as inlining that would require an additional `int3` for SLS safety.
Created attachment 51678 [details] A patch to add -mharden-sls= x86: Add -mharden-sls=[none|all|return|indirect-branch] Generate code to mitigate against straight line speculation.
Created attachment 51679 [details] A patch to add -mindirect-branch-cs-prefix It adds CS prefix to call and jmp to thunk when converting indirect call and jump.
(In reply to Andrew Cooper from comment #0) > Hello > > [FYI, this is being cross-requested of Clang too] > > Linux and other kernel level software makes use of > -mindirect-branch=thunk-extern to be able to alter the handling of indirect > branches at boot. It turns out to be advantageous to inline the thunks when > retpoline is not in use. > https://lore.kernel.org/lkml/20211026120132.613201817@infradead.org/ is some > infrastructure to make this work. > > In some cases, we want to be able to inline an `lfence; jmp *%reg` thunk. > This is fine for the low 8 registers, but not fine for %r{8..15} where the > REX prefix pushes the replacement size to being 6 bytes. > > It would be very useful to have a code-gen option to write out `call > %cs:__x86_indirect_thunk_r{8..15}` where the redundant %cs prefix will > increase the instruction length to 6, allowing the non-retpoline form to be > inlined. > -mindirect-branch-cs-prefix > Relatedly, x86 straight line speculation has been discussed before, but > without any action taken. It would be helpful to have a code gen option > which would emit `int3` following any `ret` instruction, and any indirect > jump, as neither of these two cases have following architectural execution. > > The reason these two are related is that if both options are in use, we want > an extra byte of replacement space to be able to inline `lfence; jmp *%reg; > int3`. > -mharden-sls=[none|all|return|indirect-branch] Let me know if they work. I also need testcases. > Third (and possibly only for future optimisations), Clang has been observed > to spot conditional tail calls as `Jcc __x86_indirect_thunk_*`. This is a 6 > byte source size, but needs up to 9 bytes of space for inlining including an > `int3` for straight line speculation reasons (See > https://lore.kernel.org/lkml/20211026120310.359986601@infradead.org/ for > full details). It might be enough to simply prohibit an optimisation like > this when trying to pad retpolines for inlineability. I don't think GCC does that at all.
Created attachment 51681 [details] The v2 patch to add -mharden-sls=
(In reply to H.J. Lu from comment #3) > Created attachment 51678 [details] > A patch to add -mharden-sls= > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > > Generate code to mitigate against straight line speculation. I'm getting (a lot) spurious int3 instructions with this, for example: 0000000000000280 <do_SYSENTER_32>: 280: 48 81 8f 90 00 00 00 00 02 00 00 orq $0x200,0x90(%rdi) 28b: 48 8b 47 20 mov 0x20(%rdi),%rax 28f: 48 89 87 98 00 00 00 mov %rax,0x98(%rdi) 296: e9 75 ff ff ff jmp 210 <do_fast_syscall_32> 29b: cc int3 That's not an *indirect* jump there.
(In reply to peterz from comment #7) > (In reply to H.J. Lu from comment #3) > > Created attachment 51678 [details] > > A patch to add -mharden-sls= > > > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > > > > Generate code to mitigate against straight line speculation. > > I'm getting (a lot) spurious int3 instructions with this, for example: > > 0000000000000280 <do_SYSENTER_32>: > 280: 48 81 8f 90 00 00 00 00 02 00 00 orq $0x200,0x90(%rdi) > 28b: 48 8b 47 20 mov 0x20(%rdi),%rax > 28f: 48 89 87 98 00 00 00 mov %rax,0x98(%rdi) > 296: e9 75 ff ff ff jmp 210 <do_fast_syscall_32> > 29b: cc int3 > > That's not an *indirect* jump there. Please provide a testcase.
Created attachment 51683 [details] kernel patch to test -mharden-sls=all $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o ... arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b: unreachable instruction
(In reply to H.J. Lu from comment #4) > Created attachment 51679 [details] > A patch to add -mindirect-branch-cs-prefix > > It adds CS prefix to call and jmp to thunk when converting indirect call > and jump. This seems to work as expected, I get CS prefix on all high reg thunk calls and no CS prefix on the low reg calls. Example: 29241: e8 00 00 00 00 call 29246 <intel_dsc_enable.cold+0x6e5> 29242: R_X86_64_PLT32 __x86_indirect_thunk_rax-0x4 292c8: 2e e8 00 00 00 00 cs call 292ce <intel_dsc_enable.cold+0x76d> 292ca: R_X86_64_PLT32 __x86_indirect_thunk_r10-0x4
Created attachment 51684 [details] The v2 patch to add -mharden-sls=
(In reply to peterz from comment #9) > Created attachment 51683 [details] > kernel patch to test -mharden-sls=all > > $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o > ... > arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b: > unreachable instruction Please try the v2 patch.
(In reply to H.J. Lu from comment #12) > (In reply to peterz from comment #9) > > Created attachment 51683 [details] > > kernel patch to test -mharden-sls=all > > > > $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o > > ... > > arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b: > > unreachable instruction > > Please try the v2 patch. Per comment #6 this should be v3, no? Anyway, the good news is that I now seem to have a kernel image with lots of extra int3 instructions, but all in the right place. *However*, I seem to be missing a few: 36f4: 41 5f pop %r15 36f6: e9 00 00 00 00 jmp 36fb <__do_set_cpus_allowed+0x5b> 36f7: R_X86_64_PLT32 __x86_indirect_thunk_rax-0x4 36fb: 48 8b 87 90 02 00 00 mov 0x290(%rdi),%rax There should be one after the jmp __x86_indirect_thunk_* thingy. I'll do an objtool patch to search for missing int3, but that'll have to wait until tomorrow, it's past midnight.
(In reply to peterz from comment #13) > (In reply to H.J. Lu from comment #12) > > (In reply to peterz from comment #9) > > > Created attachment 51683 [details] > > > kernel patch to test -mharden-sls=all > > > > > > $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o > > > ... > > > arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b: > > > unreachable instruction > > > > Please try the v2 patch. > > Per comment #6 this should be v3, no? Anyway, the good news is that I now > seem to have a kernel image with lots of extra int3 instructions, but all in > the right place. > > *However*, I seem to be missing a few: > > 36f4: 41 5f pop %r15 > 36f6: e9 00 00 00 00 jmp 36fb > <__do_set_cpus_allowed+0x5b> > 36f7: R_X86_64_PLT32 __x86_indirect_thunk_rax-0x4 This is a direct branch. > 36fb: 48 8b 87 90 02 00 00 mov 0x290(%rdi),%rax > > There should be one after the jmp __x86_indirect_thunk_* thingy. I'll do an > objtool patch to search for missing int3, but that'll have to wait until > tomorrow, it's past midnight.
So this is the irritating corner case where the two options are linked. *If* we are using -mindirect-branch-cs-prefix, then we intend to rewrite `jmp __x86_indirect_thunk_*` to `jmp *%reg` or `lfence; jmp *%reg` based on boot time configuration/settings. In this case, we still need to fit the `int3` for SLS protection in somewhere. The two options are: 1) Special case `jmp __x86_indirect_thunk_*` as if it were an indirect jump and write out an `int3` directly, or 2) Pad one extra %cs prefix on the jmp, so we've got space to insert one at boot time.
(In reply to Andrew Cooper from comment #15) > So this is the irritating corner case where the two options are linked. > > *If* we are using -mindirect-branch-cs-prefix, then we intend to rewrite > `jmp __x86_indirect_thunk_*` to `jmp *%reg` or `lfence; jmp *%reg` based on > boot time configuration/settings. > > In this case, we still need to fit the `int3` for SLS protection in > somewhere. > > The two options are: > 1) Special case `jmp __x86_indirect_thunk_*` as if it were an indirect jump > and write out an `int3` directly, or I can do this. > 2) Pad one extra %cs prefix on the jmp, so we've got space to insert one at > boot time.
[hjl@gnu-tgl-2 pr102952]$ cat z2.i extern void (*fptr) (int, int); void foo (int x, int y) { fptr (x, y); } [hjl@gnu-tgl-2 pr102952]$ make z2.s /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -O2 -mindirect-branch=thunk -mindirect-branch-cs-prefix -mharden-sls=all -S z2.i [hjl@gnu-tgl-2 pr102952]$ cat z2.s .file "z2.i" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc movq fptr(%rip), %rax jmp __x86_indirect_thunk_rax Is int3 needed here? .cfi_endproc .LFE0: .size foo, .-foo .section .text.__x86_indirect_thunk_rax,"axG",@progbits,__x86_indirect_thunk_rax,comdat .globl __x86_indirect_thunk_rax .hidden __x86_indirect_thunk_rax .type __x86_indirect_thunk_rax, @function __x86_indirect_thunk_rax: .LFB1: .cfi_startproc call .LIND1 .LIND0: pause lfence jmp .LIND0 .LIND1: .cfi_def_cfa_offset 16 mov %rax, (%rsp) ret int3 <<<<<<<<<<<<<<<<<<<< Is this needed? .cfi_endproc .LFE1: .ident "GCC: (GNU) 12.0.0 20211027 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-tgl-2 pr102952]$
Yes to both.
Created attachment 51685 [details] The v4 patch to add -mharden-sls=
(In reply to H.J. Lu from comment #19) > Created attachment 51685 [details] > The v4 patch to add -mharden-sls= That looks to do the right thing! Let me go write more validation stuff to double check. Thanks!
(In reply to H.J. Lu from comment #19) > Created attachment 51685 [details] > The v4 patch to add -mharden-sls= I seem to have found one 'funny': kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x23: unreachable instruction 27e6: 48 8b 14 d5 00 00 00 mov 0x0(,%rdx,8),%rdx 27ed: 00 27ea: R_X86_64_32S .rodata..c_jump_table 27ee: e9 00 00 00 00 jmp 27f3 <___bpf_prog_run+0x23> 27ef: R_X86_64_PLT32 __x86_indirect_thunk_rdx-0x4 27f3: cc int3 27f4: cc int3 27f5: 8b 53 04 mov 0x4(%rbx),%edx Other than that, it seems I need to go clean up our asm ...
One curious thing I have discovered. While auditing the -mharden-sls=all code generation in Xen, I found examples where I got "ret int3 ret int3" with no intervening instructions. It turns out this is not a regression in this change. It is a pre-existing missing optimisation, which is made more obvious when every ret is extended with an int3. It occurs for functions with either no stack frame at all, or functions which have an early exit before setting up the stack frame. Some examples which occur at -O1 do not occur at -O2. One curious example which does still repro at -O2 is this. We have a hash lookup function: struct context *sidtab_search(struct sidtab *s, u32 sid) { int hvalue; struct sidtab_node *cur; if ( !s ) return NULL; hvalue = SIDTAB_HASH(sid); cur = s->htable[hvalue]; while ( cur != NULL && sid > cur->sid ) cur = cur->next; if ( cur == NULL || sid != cur->sid ) { /* Remap invalid SIDs to the unlabeled SID. */ sid = SECINITSID_UNLABELED; hvalue = SIDTAB_HASH(sid); cur = s->htable[hvalue]; while ( cur != NULL && sid > cur->sid ) cur = cur->next; if ( !cur || sid != cur->sid ) return NULL; } return &cur->context; } which compiles (reformatted a little for width - unmodified: https://paste.debian.net/hidden/7bf675d6/) to: <sidtab_search>: 48 85 ff test %rdi,%rdi /------- 74 63 je <sidtab_search+0x68> | 48 8b 17 mov (%rdi),%rdx | 89 f0 mov %esi,%eax | 83 e0 7f and $0x7f,%eax | 48 8b 04 c2 mov (%rdx,%rax,8),%rax | 48 85 c0 test %rax,%rax | /--- 75 13 jne <sidtab_search+0x29> | /|--- eb 17 jmp <sidtab_search+0x2f> | || 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) | || 00 | ||/-> 48 8b 40 48 mov 0x48(%rax),%rax | ||| 48 85 c0 test %rax,%rax | +||-- 74 06 je <sidtab_search+0x2f> | |\|-> 39 30 cmp %esi,(%rax) | | \-- 72 f3 jb <sidtab_search+0x20> | /|---- 74 24 je <sidtab_search+0x53> | |\---> 48 8b 42 28 mov 0x28(%rdx),%rax | | 48 85 c0 test %rax,%rax | | /--- 75 11 jne <sidtab_search+0x49> |/|-|--- eb 32 jmp <sidtab_search+0x6c> // (1) ||| | 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) ||| |/-> 48 8b 40 48 mov 0x48(%rax),%rax ||| || 48 85 c0 test %rax,%rax |||/||-- 74 17 je <sidtab_search+0x60> // (2) ||||\|-> 83 38 04 cmpl $0x4,(%rax) |||| \-- 76 f2 jbe <sidtab_search+0x40> |||| 83 38 05 cmpl $0x5,(%rax) +|||---- 75 15 jne <sidtab_search+0x68> ||\|---> 48 83 c0 08 add $0x8,%rax || | c3 retq || | cc int3 || | 0f 1f 80 00 00 00 00 nopl 0x0(%rax) || \---> c3 retq // Target of (2) || cc int3 || 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) \|-----> 31 c0 xor %eax,%eax | c3 retq | cc int3 \-----> c3 retq // Target of (1) cc int3 66 90 xchg %ax,%ax There are 4 exits in total. Two have to set up %eax, so they can't usefully be merged. However, the unconditional jmp at (1) is 2 bytes, and could fully contain its target ret;int3 without even impacting the surrounding padding. Whether it inlines or merges, this drops 4 bytes. The conditional jump at (2) could be folded in to any of the other exit paths, dropping 16 bytes from the total size size. I have no idea how easy/hard this may be to track down, or whether it is worth pursuing urgently, but it probably does want looking at, seeing as SLS hardening doubles the hit.
(In reply to Andrew Cooper from comment #22) > One curious thing I have discovered. While auditing the -mharden-sls=all > code generation in Xen, I found examples where I got "ret int3 ret int3" > with no intervening instructions. > > It turns out this is not a regression in this change. It is a pre-existing > missing optimisation, which is made more obvious when every ret is extended > with an int3. > > It occurs for functions with either no stack frame at all, or functions > which have an early exit before setting up the stack frame. Some examples > which occur at -O1 do not occur at -O2. > > One curious example which does still repro at -O2 is this. We have a hash > lookup function: > > struct context *sidtab_search(struct sidtab *s, u32 sid) > { > int hvalue; > struct sidtab_node *cur; > > if ( !s ) > return NULL; > > hvalue = SIDTAB_HASH(sid); > cur = s->htable[hvalue]; > while ( cur != NULL && sid > cur->sid ) > cur = cur->next; > > if ( cur == NULL || sid != cur->sid ) > { > /* Remap invalid SIDs to the unlabeled SID. */ > sid = SECINITSID_UNLABELED; > hvalue = SIDTAB_HASH(sid); > cur = s->htable[hvalue]; > while ( cur != NULL && sid > cur->sid ) > cur = cur->next; > if ( !cur || sid != cur->sid ) > return NULL; > } > > return &cur->context; > } > > which compiles (reformatted a little for width - unmodified: > https://paste.debian.net/hidden/7bf675d6/) to: > > <sidtab_search>: > 48 85 ff test %rdi,%rdi > /------- 74 63 je <sidtab_search+0x68> > | 48 8b 17 mov (%rdi),%rdx > | 89 f0 mov %esi,%eax > | 83 e0 7f and $0x7f,%eax > | 48 8b 04 c2 mov (%rdx,%rax,8),%rax > | 48 85 c0 test %rax,%rax > | /--- 75 13 jne <sidtab_search+0x29> > | /|--- eb 17 jmp <sidtab_search+0x2f> > | || 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) > | || 00 > | ||/-> 48 8b 40 48 mov 0x48(%rax),%rax > | ||| 48 85 c0 test %rax,%rax > | +||-- 74 06 je <sidtab_search+0x2f> > | |\|-> 39 30 cmp %esi,(%rax) > | | \-- 72 f3 jb <sidtab_search+0x20> > | /|---- 74 24 je <sidtab_search+0x53> > | |\---> 48 8b 42 28 mov 0x28(%rdx),%rax > | | 48 85 c0 test %rax,%rax > | | /--- 75 11 jne <sidtab_search+0x49> > |/|-|--- eb 32 jmp <sidtab_search+0x6c> // (1) > ||| | 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > ||| |/-> 48 8b 40 48 mov 0x48(%rax),%rax > ||| || 48 85 c0 test %rax,%rax > |||/||-- 74 17 je <sidtab_search+0x60> // (2) > ||||\|-> 83 38 04 cmpl $0x4,(%rax) > |||| \-- 76 f2 jbe <sidtab_search+0x40> > |||| 83 38 05 cmpl $0x5,(%rax) > +|||---- 75 15 jne <sidtab_search+0x68> > ||\|---> 48 83 c0 08 add $0x8,%rax > || | c3 retq > || | cc int3 > || | 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > || \---> c3 retq // Target of (2) > || cc int3 > || 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > \|-----> 31 c0 xor %eax,%eax > | c3 retq > | cc int3 > \-----> c3 retq // Target of (1) > cc int3 > 66 90 xchg %ax,%ax > > There are 4 exits in total. Two have to set up %eax, so they can't usefully > be merged. > > However, the unconditional jmp at (1) is 2 bytes, and could fully contain > its target ret;int3 without even impacting the surrounding padding. Whether > it inlines or merges, this drops 4 bytes. > > The conditional jump at (2) could be folded in to any of the other exit > paths, dropping 16 bytes from the total size size. > > I have no idea how easy/hard this may be to track down, or whether it is > worth pursuing urgently, but it probably does want looking at, seeing as SLS > hardening doubles the hit. Please open a separate bug to track it. Should shrink-wrap handle it?
Should I submit the current patches?
(In reply to H.J. Lu from comment #24) > Should I submit the current patches? Yes, I'd say so. Once merged I'll send a kernel patch to use -mindirect-branch-cs-prefix for all RETPOLINE builds. I posted an SLS patch but haven't really had much feedback on that yet, we'll see.
(In reply to peterz from comment #25) > (In reply to H.J. Lu from comment #24) > > Should I submit the current patches? > > Yes, I'd say so. Once merged I'll send a kernel patch to use > -mindirect-branch-cs-prefix for all RETPOLINE builds. I posted an SLS patch > but haven't really had much feedback on that yet, we'll see. Patches are at https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584636.html https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584637.html
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3 commit r12-5353-g53a643f8568067d7700a9f2facc8ba39974973d3 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Oct 27 07:48:54 2021 -0700 x86: Add -mharden-sls=[none|all|return|indirect-branch] Add -mharden-sls= to mitigate against straight line speculation (SLS) for function return and indirect branch by adding an INT3 instruction after function return and indirect branch. gcc/ PR target/102952 * config/i386/i386-opts.h (harden_sls): New enum. * config/i386/i386.c (output_indirect_thunk): Mitigate against SLS for function return. (ix86_output_function_return): Likewise. (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect branch. (ix86_output_indirect_jmp): Likewise. (ix86_output_call_insn): Likewise. * config/i386/i386.opt: Add -mharden-sls=. * doc/invoke.texi: Document -mharden-sls=. gcc/testsuite/ PR target/102952 * gcc.target/i386/harden-sls-1.c: New test. * gcc.target/i386/harden-sls-2.c: Likewise. * gcc.target/i386/harden-sls-3.c: Likewise. * gcc.target/i386/harden-sls-4.c: Likewise. * gcc.target/i386/harden-sls-5.c: Likewise.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:2196a681d7810ad8b227bf983f38ba716620545e commit r12-5377-g2196a681d7810ad8b227bf983f38ba716620545e Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Oct 27 06:27:15 2021 -0700 x86: Add -mindirect-branch-cs-prefix Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to indirect thunk with branch target in r8-r15 registers so that the call and jmp instruction length is 6 bytes to allow them to be replaced with "lfence; call *%r8-r15" or "lfence; jmp *%r8-r15" at run-time. gcc/ PR target/102952 * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Emit CS prefix for -mindirect-branch-cs-prefix. (ix86_output_indirect_branch_via_reg): Likewise. * config/i386/i386.opt: Add -mindirect-branch-cs-prefix. * doc/invoke.texi: Document -mindirect-branch-cs-prefix. gcc/testsuite/ PR target/102952 * gcc.target/i386/indirect-thunk-cs-prefix-1.c: New test. * gcc.target/i386/indirect-thunk-cs-prefix-2.c: Likewise.
Fixed for GCC 12.
(In reply to CVS Commits from comment #27) > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > It occurs to me that `indirect-branch` needs renaming to be `indirect-jmp` as the logic specifically does not make changes to code generation relating to `call` instructions.
Created attachment 52134 [details] A patch to rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp
(In reply to Andrew Cooper from comment #30) > (In reply to CVS Commits from comment #27) > > > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > > > It occurs to me that `indirect-branch` needs renaming to be `indirect-jmp` > as the logic specifically does not make changes to code generation relating > to `call` instructions. Does the patch in comment #31 look OK?
Looks good to me
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:ed8060950c64f2e449aaf90e438aa26d0d9d0b31 commit r12-6320-ged8060950c64f2e449aaf90e438aa26d0d9d0b31 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jan 5 16:33:16 2022 -0800 x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp Indirect branch also includes indirect call instructions. Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its intended behavior. PR target/102952 * config/i386/i386-opts.h (harden_sls): Replace harden_sls_indirect_branch with harden_sls_indirect_jmp. * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Likewise. (ix86_output_indirect_jmp): Likewise. (ix86_output_call_insn): Likewise. * config/i386/i386.opt: Replace indirect-branch with indirect-jmp. Replace harden_sls_indirect_branch with harden_sls_indirect_jmp. * doc/invoke.texi (-harden-sls=): Replace indirect-branch with indirect-jmp.
Fixed.
HJ, does it make sense to backport these to branches since AFAIU they also address possible security issues?
(In reply to Richard Biener from comment #36) > HJ, does it make sense to backport these to branches since AFAIU they also > address possible security issues? Yes, Linux kernel needs it. I will work on it.
(In reply to H.J. Lu from comment #37) > (In reply to Richard Biener from comment #36) > > HJ, does it make sense to backport these to branches since AFAIU they also > > address possible security issues? > > Yes, Linux kernel needs it. I will work on it. I am testing GCC 11 backport on users/hjl/pr102952/gcc-11 branch: https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pr102952/gcc-11
The GCC 11 backport is posted at https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589527.html
I've given the GCC-11 branch a test and everything appears to be in order.
The releases/gcc-11 branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:39d944c4237e5d35e28a2668d3b9a2e0f6f7bd01 commit r11-9575-g39d944c4237e5d35e28a2668d3b9a2e0f6f7bd01 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Oct 27 07:48:54 2021 -0700 x86: Add -mharden-sls=[none|all|return|indirect-branch] Add -mharden-sls= to mitigate against straight line speculation (SLS) for function return and indirect branch by adding an INT3 instruction after function return and indirect branch. gcc/ PR target/102952 * config/i386/i386-opts.h (harden_sls): New enum. * config/i386/i386.c (output_indirect_thunk): Mitigate against SLS for function return. (ix86_output_function_return): Likewise. (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect branch. (ix86_output_indirect_jmp): Likewise. (ix86_output_call_insn): Likewise. * config/i386/i386.opt: Add -mharden-sls=. * doc/invoke.texi: Document -mharden-sls=. gcc/testsuite/ PR target/102952 * gcc.target/i386/harden-sls-1.c: New test. * gcc.target/i386/harden-sls-2.c: Likewise. * gcc.target/i386/harden-sls-3.c: Likewise. * gcc.target/i386/harden-sls-4.c: Likewise. * gcc.target/i386/harden-sls-5.c: Likewise. (cherry picked from commit 53a643f8568067d7700a9f2facc8ba39974973d3)
The releases/gcc-11 branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:5d928740a533cd9e78673fad7ea86d20b2142277 commit r11-9576-g5d928740a533cd9e78673fad7ea86d20b2142277 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Oct 27 06:27:15 2021 -0700 x86: Add -mindirect-branch-cs-prefix Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to indirect thunk with branch target in r8-r15 registers so that the call and jmp instruction length is 6 bytes to allow them to be replaced with "lfence; call *%r8-r15" or "lfence; jmp *%r8-r15" at run-time. gcc/ PR target/102952 * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Emit CS prefix for -mindirect-branch-cs-prefix. (ix86_output_indirect_branch_via_reg): Likewise. * config/i386/i386.opt: Add -mindirect-branch-cs-prefix. * doc/invoke.texi: Document -mindirect-branch-cs-prefix. gcc/testsuite/ PR target/102952 * gcc.target/i386/indirect-thunk-cs-prefix-1.c: New test. * gcc.target/i386/indirect-thunk-cs-prefix-2.c: Likewise. (cherry picked from commit 2196a681d7810ad8b227bf983f38ba716620545e)
The releases/gcc-11 branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:58a4e292e8507a2968bfd2b10631ba95d5440c97 commit r11-9577-g58a4e292e8507a2968bfd2b10631ba95d5440c97 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jan 5 16:33:16 2022 -0800 x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp Indirect branch also includes indirect call instructions. Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its intended behavior. PR target/102952 * config/i386/i386-opts.h (harden_sls): Replace harden_sls_indirect_branch with harden_sls_indirect_jmp. * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Likewise. (ix86_output_indirect_jmp): Likewise. (ix86_output_call_insn): Likewise. * config/i386/i386.opt: Replace indirect-branch with indirect-jmp. Replace harden_sls_indirect_branch with harden_sls_indirect_jmp. * doc/invoke.texi (-harden-sls=): Replace indirect-branch with indirect-jmp. (cherry picked from commit ed8060950c64f2e449aaf90e438aa26d0d9d0b31)