There are four credible ways to find the stack canary: 1. %fs:symbol 2. %fs:symbol(%rip) [with a PCREL relocation] 3. %gs:symbol 4. %gs:symbol(%rip) (Obviously the %rip variants only work on x86_64.) The current code is roughly equivalent to (1) or (3) where symbol is an absolute symbol equal to 0x28 or similar. Please give a command line option to choose any of the four modes and specify the symbol name. (Or just hardcode the symbol name to __gcc_stack_canary or whatever if the option is set.) My perferred solution would be -mstack-protector-cookie=gs:symname or -mstack-protector-cookie=gs:symname(%rip) or -mstack-protector-cookie=gs:0x28 depending on what's desired. I personally consider it to have been a mistake for Linux to support a stack canary without insisting that GCC fix this issue first. The x86_32 case, in particular, is a collossal mess in the kernel that slows kernel entries down and seriously overcomplicates the kernel code because the stack canary addressing mode that GCC chooses is nonsensical.
Well, you can choose between "__stack_chk_guard(%rip)" and "%gs:40"... :)
(In reply to H. Peter Anvin from comment #1) > Well, you can choose between "__stack_chk_guard(%rip)" and "%gs:40"... :) Wow, I guess I didn't even consider the former. That would be option 5: symbol(%rip). Let's not use it for the kernel. For x86_32, we want %fs:symbol. %fs:constant would be tolarable, too. For x86_64, it depends on the end game for percpu access, I suppose.
Created attachment 41949 [details] Patch that introduces mstack-protector-guard-offset= and mstack-protector-guard-reg= Prototype patch that introduces mstack-protector-guard-offset= and mstack-protector-guard-reg= options. The later can be used to override default TLS reg, so compilation becomes independent of -mcmodel=kernel.
(In reply to Uroš Bizjak from comment #3) > Created attachment 41949 [details] > Patch that introduces mstack-protector-guard-offset= and > mstack-protector-guard-reg= e.g.: -mstack-protector-guard-reg=%fs -mstack-protector-guard-offset=0x200
Author: uros Date: Tue Aug 8 16:48:46 2017 New Revision: 250965 URL: https://gcc.gnu.org/viewcvs?rev=250965&root=gcc&view=rev Log: PR target/81708 * config/i386/i386.opt (mstack-protector-guard-reg=): New option (mstack-protector-guard-offset=): Ditto. * config/i386/i386.c (ix86_option_override): Handle -mstack-protector-guard-reg= and -mstack-protector-guard-offset= options. (ix86_stack_protect_guard): Use ix86_stack_protect_guard_reg and ix86_stack_protect_guard_offset variables. (TARGET_STACK_PROTECT_GUARD): Always define. * doc/invoke.texi (x86 Options): Document -mstack-protector-guard-reg= and -mstack-protector-guard-offset= options. testsuite/ChangeLog: PR target/81708 * gcc.target/i386/stack-prot-guard.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/stack-prot-guard.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.opt trunk/gcc/doc/invoke.texi trunk/gcc/testsuite/ChangeLog
Implemented for gcc-8.
Hmm. This is a big improvement, but it's still going to be awkward to use -- if we want to use a normal Linux percpu variable, we're stuck putting it in a fixed location that's known at compile time as opposed to just at link time. It also still prevents Linux from switching to PC-relative percpu addressing to reduce text size. Can we do -mstack-protector-offset=[symbol name]? And could it be extended to ask for PC-relative addressing?
How about simply letting the user enter an assembly expression of neither of the standard ABI options are suitable? Also, shouldn't the user space default on 64 bits be an offset into the TLS using %fs, or is there something magic about how the kernel is compiled that changes it to %gs:?
In some applications it might even be appropriate to use the RDPID instruction and store the canary in the IA32_TSC_AUX MSR.
Created attachment 41955 [details] patch that introduces mstack-protector-guard-symbol= This patch can be used to override TLS offset with a symbol. gcc.target/i386/stack-prot-guard.c compiles to (-O2 -fstack-protector-all -mstack-protector-guard-symbol=symbol_name): f: subq $24, %rsp movq %fs:symbol_name(%rip), %rax movq %rax, 8(%rsp) xorl %eax, %eax movq 8(%rsp), %rax xorq %fs:symbol_name(%rip), %rax jne .L5 addq $24, %rsp ret
(In reply to H. Peter Anvin from comment #8) > How about simply letting the user enter an assembly expression of neither of > the standard ABI options are suitable? Also, shouldn't the user space > default on 64 bits be an offset into the TLS using %fs, or is there > something magic about how the kernel is compiled that changes it to %gs:? -mcmodel=kernel switches TLS reg to %gs. You can use mstack-protector-guard-reg=%gs with other code models.
Author: uros Date: Thu Aug 10 20:59:10 2017 New Revision: 251040 URL: https://gcc.gnu.org/viewcvs?rev=251040&root=gcc&view=rev Log: PR target/81708 * config/i386/i386.opt (mstack-protector-guard-symbol=): New option * config/i386/i386.c (ix86_stack_protect_guard): Use ix86_stack_protect_guard_symbol_str to generate varible declaration. * doc/invoke.texi (x86 Options): Document -mstack-protector-guard-symbol= option. testsuite/ChangeLog: PR target/81708 * gcc.target/i386/stack-prot-sym.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/stack-prot-sym.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.opt trunk/gcc/doc/invoke.texi trunk/gcc/testsuite/ChangeLog
Author: aldyh Date: Wed Sep 13 16:41:28 2017 New Revision: 252350 URL: https://gcc.gnu.org/viewcvs?rev=252350&root=gcc&view=rev Log: PR target/81708 * config/i386/i386.opt (mstack-protector-guard-reg=): New option (mstack-protector-guard-offset=): Ditto. * config/i386/i386.c (ix86_option_override): Handle -mstack-protector-guard-reg= and -mstack-protector-guard-offset= options. (ix86_stack_protect_guard): Use ix86_stack_protect_guard_reg and ix86_stack_protect_guard_offset variables. (TARGET_STACK_PROTECT_GUARD): Always define. * doc/invoke.texi (x86 Options): Document -mstack-protector-guard-reg= and -mstack-protector-guard-offset= options. testsuite/ChangeLog: PR target/81708 * gcc.target/i386/stack-prot-guard.c: New test. Added: branches/range-gen2/gcc/testsuite/gcc.target/i386/stack-prot-guard.c Modified: branches/range-gen2/gcc/ChangeLog branches/range-gen2/gcc/config/i386/i386.c branches/range-gen2/gcc/config/i386/i386.opt branches/range-gen2/gcc/doc/invoke.texi branches/range-gen2/gcc/testsuite/ChangeLog
Author: aldyh Date: Wed Sep 13 16:51:37 2017 New Revision: 252397 URL: https://gcc.gnu.org/viewcvs?rev=252397&root=gcc&view=rev Log: PR target/81708 * config/i386/i386.opt (mstack-protector-guard-symbol=): New option * config/i386/i386.c (ix86_stack_protect_guard): Use ix86_stack_protect_guard_symbol_str to generate varible declaration. * doc/invoke.texi (x86 Options): Document -mstack-protector-guard-symbol= option. testsuite/ChangeLog: PR target/81708 * gcc.target/i386/stack-prot-sym.c: New test. Added: branches/range-gen2/gcc/testsuite/gcc.target/i386/stack-prot-sym.c Modified: branches/range-gen2/gcc/ChangeLog branches/range-gen2/gcc/config/i386/i386.c branches/range-gen2/gcc/config/i386/i386.opt branches/range-gen2/gcc/doc/invoke.texi branches/range-gen2/gcc/testsuite/ChangeLog
Uroš, stack-prot-sym.c fails on ia32 with PIC/PIE: the address/value of my_guard is loaded from the GOT, instead of appearing as %gs:my_guard. After being confused by the expectation that my_guard should be a LE TLS symbol, I'm coming to the conclusion that my expectation was incorrect, and it is indeed supposed to be a plain offset, so the code is correct, if not as efficient as on PDC. Does that sound right? I'm undecided as to whether avoiding the GOT reference, and requiring the symbol to be usable as a link-time constant, would be desirable/doable. Thoughts?
(In reply to Alexandre Oliva from comment #15) > Uroš, > > stack-prot-sym.c fails on ia32 with PIC/PIE: the address/value of my_guard > is loaded from the GOT, instead of appearing as %gs:my_guard. > > After being confused by the expectation that my_guard should be a LE TLS > symbol, I'm coming to the conclusion that my expectation was incorrect, and > it is indeed supposed to be a plain offset, so the code is correct, if not > as efficient as on PDC. Does that sound right? > > I'm undecided as to whether avoiding the GOT reference, and requiring the > symbol to be usable as a link-time constant, would be desirable/doable. > Thoughts? The current implementation was implemented specifically for linux, as requested in Comment #0, without any other usages in mind. Also, I'm not deeply familiar with the details of symbol linking, so perhaps HJ can help here.
(In reply to Alexandre Oliva from comment #15) > Uroš, > > stack-prot-sym.c fails on ia32 with PIC/PIE: the address/value of my_guard > is loaded from the GOT, instead of appearing as %gs:my_guard. > When PIC/PIE is used, my_guard is loaded from GOT, regardless of -m32 or -m64.
on x86_64 with -fPIC or -fpic, my_guard's address is indeed loaded from the GOT with @GOTPCREL indeed on x86_64 with -fPIE or -fpie, however, it is used just as expected by the testcase. which should be fine as long as my_guard is required to be a link-time defined constant. but if it is, then there's no point in loading its address from the GOT, not on x86_64 PIC, not on ia32 PIC/PIE. thus my question. something's fishy there.
The master branch has been updated by Alexandre Oliva <aoliva@gcc.gnu.org>: https://gcc.gnu.org/g:4ad200addc3eaf55fba6cc91db3d3b66eabaf3d0 commit r13-2043-g4ad200addc3eaf55fba6cc91db3d3b66eabaf3d0 Author: Alexandre Oliva <oliva@adacore.com> Date: Mon Aug 15 09:21:33 2022 -0300 i386 PIE: testsuite: cope with default pie on ia32 This patch continues the effort of cleaning up the testsuite for --enable-default-pie; the focus herein is mostly 32-bit x86. As much as I tried to avoid it, most of the changes to the testsuite simply disable PIC/PIE, for reasons I'm going to detail below. static-cdtor1.C gets new patterns to match PIE output. Some avx512fp16 tests change only in register allocation, because of the register used to hold the GOT base address. Interrupt tests changed in this regard as well, but here it also affected register saving and restoring. The previous patch modified cet-sjlj tests, mentioning a single regexp covering PIC and nonPIC got incorrect match counts. I found out that adding ?: to parenthesized subpatterns avoids miscounting matches. Other tests that count certain kinds of insns needed adjustment over insns in get_pc_thunk, extra loads from the GOT, or extra adds to compute addresses. In one case, namely stack-check-12, it is nonPIC that had extra insns, that PIC gets rid of, or rather, pushing and popping the PIC register obviates the dummy push and matching pop used for stack probing in nonpic. pr95126 tests were supposed to optimize loads into known constants, but the @GOTOFF addresses prevent that for reasons I have not investigated, but that would be clearly desirable, so I've XFAILed these. pr95852 is another case of missed optimization: sibcalls are not possible when the PIC register needs to be set up for the call, which prevents the expected constant propagation to the return block; I have adjusted the codegen expectations of these tests. As for tests that disable PIE... Some are judgment calls, that fail for similar reasons as tests described above, but I chose not to adjust their expectations; others are just not possible with PIC, or not worth the effort of adjusting. anon[14].C check for no global or comdat symbols, respectively, but -fPIE outputs get_pc_thunk, as global hidden comdat. initlist-const1.C wants .rodata and checks for no .data, but PIC outputs constant data that needs relocations in .data.rel.ro.local. no-stack-protector-attr-3.C and stackprotectexplicit2.C count stack_check_fail matches; -fPIE calls stack_check_fail_local instead, which matches the pattern, but this symbol is also marked as .hidden, so the match count needs to be adjusted. pr71694.C checks for no movl, but get_pc_thunk contains one. pr102892-1.c is a missed optimization, ivopts creates an induction variable because the array address can't be part of an indexing base address with PIE, and that ends up stopping a load from being resolved to a constant as expected. sibcall-11.c needs @PLT for the call, which requires the PIC register, which makes sibcalling impossible. builtin-self.c, in turn, expects no calls, but PIC calls get_pc_thunk. avx* vector tests that had PIE disabled were affected in that the need for GOT-based addressing modes changed instruction selection in ways that deviated from the expectations of the tests. Ditto other vector tests: pr100865*, pr101796-1, pr101846, pr101989-broadcast-1, and pr102021, pr54855-[37], and pr90773-17. pr15184* tests need a PIC register to access global variables, which affects register allocation, so the patterns would have to be adjusted. pr27971 can't use the expected addressing mode to dereference the array with PIC, so it ends up selecting an indexed addressing mode, obviating the expected separate shift insn. pr70263-2 is another case that implicitly expects a sibcall, impossible because of the need for the PIC register; without a sibcall, the expected REG_EQUIV for the reuse of the stack slot of an incoming argument does not occur. pr78035 duplicates the final compare in both then and else blocks with PIE, which deviates from the expected cmp count. pr81736-[57] test for no frame pointer, but the PIC register assignment to a call-saved register forces a frame; the former ends up not using the PIC register, but it's only optimized out after committing to a stack frame to preserve it. pr85620-6 also expects a tail call in a situation that is impossible on ia32 PIC. pr85667-6 doesn't expect the movl in get_pc_thunk. pr93492-5 tests -mfentry, not available with PIC on ia32. pr96539 expects a tail-call, to avoid copying a large-ish struct argument, but the call requires the PIC register, so no tail-call. stack-prot-sym.c expects a nonpic addressing mode. for gcc/testsuite/ChangeLog * g++.dg/abi/anon1.C: Disable pie on ia32. * g++.dg/abi/anon4.C: Likewise. * g++.dg/cpp0x/initlist-const1.C: Likewise. * g++.dg/no-stack-protector-attr-3.C: Likewise. * g++.dg/stackprotectexplicit2.C: Likewise. * g++.dg/pr71694.C: Likewise. * gcc.dg/pr102892-1.c: Likewise. * gcc.dg/sibcall-11.c: Likewise. * gcc.dg/torture/builtin-self.c: Likewise. * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Likewise. * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Likewise. * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise. * gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise. * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise. * gcc.target/i386/avx512fp16-broadcast-1.c: Likewise. * gcc.target/i386/avx512fp16-pr101846.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/pr100865-2.c: Likewise. * gcc.target/i386/pr100865-3.c: Likewise. * gcc.target/i386/pr100865-4a.c: Likewise. * gcc.target/i386/pr100865-4b.c: Likewise. * gcc.target/i386/pr100865-5a.c: Likewise. * gcc.target/i386/pr100865-5b.c: Likewise. * gcc.target/i386/pr100865-6a.c: Likewise. * gcc.target/i386/pr100865-6b.c: Likewise. * gcc.target/i386/pr100865-6c.c: Likewise. * gcc.target/i386/pr100865-7b.c: Likewise. * gcc.target/i386/pr101796-1.c: Likewise. * gcc.target/i386/pr101846-2.c: Likewise. * gcc.target/i386/pr101989-broadcast-1.c: Likewise. * gcc.target/i386/pr102021.c: Likewise. * gcc.target/i386/pr90773-17.c: Likewise. * gcc.target/i386/pr54855-3.c: Likewise. * gcc.target/i386/pr54855-7.c: Likewise. * gcc.target/i386/pr15184-1.c: Likewise. * gcc.target/i386/pr15184-2.c: Likewise. * gcc.target/i386/pr27971.c: Likewise. * gcc.target/i386/pr70263-2.c: Likewise. * gcc.target/i386/pr78035.c: Likewise. * gcc.target/i386/pr81736-5.c: Likewise. * gcc.target/i386/pr81736-7.c: Likewise. * gcc.target/i386/pr85620-6.c: Likewise. * gcc.target/i386/pr85667-6.c: Likewise. * gcc.target/i386/pr93492-5.c: Likewise. * gcc.target/i386/pr96539.c: Likewise. PR target/81708 (%gs:my_guard) * gcc.target/i386/stack-prot-sym.c: Likewise. * g++.dg/init/static-cdtor1.C: Add alternate patterns for PIC. * gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: Extend patterns for PIC/PIE register allocation. * gcc.target/i386/pr100704-3.c: Likewise. * gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Likewise. * gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: Likewise. * gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Likewise. * gcc.target/i386/avx512fp16-vmovsh-1a.c: Likewise. * gcc.target/i386/interrupt-11.c: Likewise, allowing for preservation of the PIC register. * gcc.target/i386/interrupt-12.c: Likewise. * gcc.target/i386/interrupt-13.c: Likewise. * gcc.target/i386/interrupt-15.c: Likewise. * gcc.target/i386/interrupt-16.c: Likewise. * gcc.target/i386/interrupt-17.c: Likewise. * gcc.target/i386/interrupt-8.c: Likewise. * gcc.target/i386/cet-sjlj-6a.c: Combine patterns from previous change. * gcc.target/i386/cet-sjlj-6b.c: Likewise. * gcc.target/i386/pad-10.c: Accept insns in get_pc_thunk. * gcc.target/i386/pr70321.c: Likewise. * gcc.target/i386/pr81563.c: Likewise. * gcc.target/i386/pr84278.c: Likewise. * gcc.target/i386/pr90773-2.c: Likewise, plus extra loads from the GOT. * gcc.target/i386/pr90773-3.c: Likewise. * gcc.target/i386/pr94913-2.c: Accept additional PIC insns. * gcc.target/i386/stack-check-17.c: Likewise. * gcc.target/i386/stack-check-12.c: Do not require dummy stack probing obviated with PIC. * gcc.target/i386/pr95126-m32-1.c: Expect missed optimization with PIC. * gcc.target/i386/pr95126-m32-2.c: Likewise. * gcc.target/i386/pr95852-2.c: Accept different optimization with PIC. * gcc.target/i386/pr95852-4.c: Likewise.