Bug 81708 - The x86 stack canary location should be customizable
Summary: The x86 stack canary location should be customizable
Status: REOPENED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: unknown
: P3 normal
Target Milestone: 8.0
Assignee: Uroš Bizjak
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-04 00:31 UTC by Andy Lutomirski
Modified: 2023-12-25 11:14 UTC (History)
6 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-08-08 00:00:00


Attachments
Patch that introduces mstack-protector-guard-offset= and mstack-protector-guard-reg= (1.66 KB, patch)
2017-08-08 09:39 UTC, Uroš Bizjak
Details | Diff
patch that introduces mstack-protector-guard-symbol= (1000 bytes, patch)
2017-08-09 08:50 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Lutomirski 2017-08-04 00:31:46 UTC
There are four credible ways to find the stack canary:

1. %fs:symbol
2. %fs:symbol(%rip) [with a PCREL relocation]
3. %gs:symbol
4. %gs:symbol(%rip)

(Obviously the %rip variants only work on x86_64.)

The current code is roughly equivalent to (1) or (3) where symbol is an absolute symbol equal to 0x28 or similar.

Please give a command line option to choose any of the four modes and specify the symbol name.  (Or just hardcode the symbol name to __gcc_stack_canary or whatever if the option is set.)

My perferred solution would be -mstack-protector-cookie=gs:symname or -mstack-protector-cookie=gs:symname(%rip) or -mstack-protector-cookie=gs:0x28 depending on what's desired.

I personally consider it to have been a mistake for Linux to support a stack canary without insisting that GCC fix this issue first.  The x86_32 case, in particular, is a collossal mess in the kernel that slows kernel entries down and seriously overcomplicates the kernel code because the stack canary addressing mode that GCC chooses is nonsensical.
Comment 1 H. Peter Anvin 2017-08-04 00:39:46 UTC
Well, you can choose between "__stack_chk_guard(%rip)" and "%gs:40"... :)
Comment 2 Andy Lutomirski 2017-08-04 00:47:12 UTC
(In reply to H. Peter Anvin from comment #1)
> Well, you can choose between "__stack_chk_guard(%rip)" and "%gs:40"... :)

Wow, I guess I didn't even consider the former.  That would be option 5: symbol(%rip).  Let's not use it for the kernel.

For x86_32, we want %fs:symbol.  %fs:constant would be tolarable, too.  For x86_64, it depends on the end game for percpu access, I suppose.
Comment 3 Uroš Bizjak 2017-08-08 09:39:37 UTC
Created attachment 41949 [details]
Patch that introduces mstack-protector-guard-offset= and mstack-protector-guard-reg=

Prototype patch that introduces mstack-protector-guard-offset= and mstack-protector-guard-reg= options.  The later can be used to override default TLS reg, so compilation becomes independent of -mcmodel=kernel.
Comment 4 Uroš Bizjak 2017-08-08 09:44:53 UTC
(In reply to Uroš Bizjak from comment #3)
> Created attachment 41949 [details]
> Patch that introduces mstack-protector-guard-offset= and
> mstack-protector-guard-reg=

e.g.:

-mstack-protector-guard-reg=%fs -mstack-protector-guard-offset=0x200
Comment 5 uros 2017-08-08 16:49:19 UTC
Author: uros
Date: Tue Aug  8 16:48:46 2017
New Revision: 250965

URL: https://gcc.gnu.org/viewcvs?rev=250965&root=gcc&view=rev
Log:
	PR target/81708
	* config/i386/i386.opt (mstack-protector-guard-reg=): New option
	(mstack-protector-guard-offset=): Ditto.
	* config/i386/i386.c (ix86_option_override): Handle
	-mstack-protector-guard-reg= and -mstack-protector-guard-offset=
	options.
	(ix86_stack_protect_guard): Use ix86_stack_protect_guard_reg and
	ix86_stack_protect_guard_offset variables.
	(TARGET_STACK_PROTECT_GUARD): Always define.
	* doc/invoke.texi (x86 Options): Document -mstack-protector-guard-reg=
	and -mstack-protector-guard-offset= options.

testsuite/ChangeLog:

	PR target/81708
	* gcc.target/i386/stack-prot-guard.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/stack-prot-guard.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.opt
    trunk/gcc/doc/invoke.texi
    trunk/gcc/testsuite/ChangeLog
Comment 6 Uroš Bizjak 2017-08-08 16:54:50 UTC
Implemented for gcc-8.
Comment 7 Andy Lutomirski 2017-08-08 17:19:53 UTC
Hmm.  This is a big improvement, but it's still going to be awkward to use -- if we want to use a normal Linux percpu variable, we're stuck putting it in a fixed location that's known at compile time as opposed to just at link time.  It also still prevents Linux from switching to PC-relative percpu addressing to reduce text size.

Can we do -mstack-protector-offset=[symbol name]?  And could it be extended to ask for PC-relative addressing?
Comment 8 H. Peter Anvin 2017-08-08 17:24:52 UTC
How about simply letting the user enter an assembly expression of neither of the standard ABI options are suitable?  Also, shouldn't the user space default on 64 bits be an offset into the TLS using %fs, or is there something magic about how the kernel is compiled that changes it to %gs:?
Comment 9 H. Peter Anvin 2017-08-08 17:27:35 UTC
In some applications it might even be appropriate to use the RDPID instruction and store the canary in the IA32_TSC_AUX MSR.
Comment 10 Uroš Bizjak 2017-08-09 08:50:33 UTC
Created attachment 41955 [details]
patch that introduces mstack-protector-guard-symbol=

This patch can be used to override TLS offset with a symbol.

gcc.target/i386/stack-prot-guard.c compiles to (-O2 -fstack-protector-all -mstack-protector-guard-symbol=symbol_name):

f:
        subq    $24, %rsp
        movq    %fs:symbol_name(%rip), %rax
        movq    %rax, 8(%rsp)
        xorl    %eax, %eax
        movq    8(%rsp), %rax
        xorq    %fs:symbol_name(%rip), %rax
        jne     .L5
        addq    $24, %rsp
        ret
Comment 11 Uroš Bizjak 2017-08-09 08:54:23 UTC
(In reply to H. Peter Anvin from comment #8)
> How about simply letting the user enter an assembly expression of neither of
> the standard ABI options are suitable?  Also, shouldn't the user space
> default on 64 bits be an offset into the TLS using %fs, or is there
> something magic about how the kernel is compiled that changes it to %gs:?

-mcmodel=kernel switches TLS reg to %gs.

You can use mstack-protector-guard-reg=%gs with other code models.
Comment 12 uros 2017-08-10 20:59:42 UTC
Author: uros
Date: Thu Aug 10 20:59:10 2017
New Revision: 251040

URL: https://gcc.gnu.org/viewcvs?rev=251040&root=gcc&view=rev
Log:
	PR target/81708
	* config/i386/i386.opt (mstack-protector-guard-symbol=): New option
	* config/i386/i386.c (ix86_stack_protect_guard): Use
	ix86_stack_protect_guard_symbol_str to generate varible declaration.
	* doc/invoke.texi (x86 Options): Document
	-mstack-protector-guard-symbol= option.

testsuite/ChangeLog:

	PR target/81708
	* gcc.target/i386/stack-prot-sym.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/stack-prot-sym.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.opt
    trunk/gcc/doc/invoke.texi
    trunk/gcc/testsuite/ChangeLog
Comment 13 Aldy Hernandez 2017-09-13 16:41:59 UTC
Author: aldyh
Date: Wed Sep 13 16:41:28 2017
New Revision: 252350

URL: https://gcc.gnu.org/viewcvs?rev=252350&root=gcc&view=rev
Log:
	PR target/81708
	* config/i386/i386.opt (mstack-protector-guard-reg=): New option
	(mstack-protector-guard-offset=): Ditto.
	* config/i386/i386.c (ix86_option_override): Handle
	-mstack-protector-guard-reg= and -mstack-protector-guard-offset=
	options.
	(ix86_stack_protect_guard): Use ix86_stack_protect_guard_reg and
	ix86_stack_protect_guard_offset variables.
	(TARGET_STACK_PROTECT_GUARD): Always define.
	* doc/invoke.texi (x86 Options): Document -mstack-protector-guard-reg=
	and -mstack-protector-guard-offset= options.

testsuite/ChangeLog:

	PR target/81708
	* gcc.target/i386/stack-prot-guard.c: New test.

Added:
    branches/range-gen2/gcc/testsuite/gcc.target/i386/stack-prot-guard.c
Modified:
    branches/range-gen2/gcc/ChangeLog
    branches/range-gen2/gcc/config/i386/i386.c
    branches/range-gen2/gcc/config/i386/i386.opt
    branches/range-gen2/gcc/doc/invoke.texi
    branches/range-gen2/gcc/testsuite/ChangeLog
Comment 14 Aldy Hernandez 2017-09-13 16:52:09 UTC
Author: aldyh
Date: Wed Sep 13 16:51:37 2017
New Revision: 252397

URL: https://gcc.gnu.org/viewcvs?rev=252397&root=gcc&view=rev
Log:
	PR target/81708
	* config/i386/i386.opt (mstack-protector-guard-symbol=): New option
	* config/i386/i386.c (ix86_stack_protect_guard): Use
	ix86_stack_protect_guard_symbol_str to generate varible declaration.
	* doc/invoke.texi (x86 Options): Document
	-mstack-protector-guard-symbol= option.

testsuite/ChangeLog:

	PR target/81708
	* gcc.target/i386/stack-prot-sym.c: New test.

Added:
    branches/range-gen2/gcc/testsuite/gcc.target/i386/stack-prot-sym.c
Modified:
    branches/range-gen2/gcc/ChangeLog
    branches/range-gen2/gcc/config/i386/i386.c
    branches/range-gen2/gcc/config/i386/i386.opt
    branches/range-gen2/gcc/doc/invoke.texi
    branches/range-gen2/gcc/testsuite/ChangeLog
Comment 15 Alexandre Oliva 2022-07-19 15:41:55 UTC
Uroš,

stack-prot-sym.c fails on ia32 with PIC/PIE: the address/value of my_guard is loaded from the GOT, instead of appearing as %gs:my_guard.

After being confused by the expectation that my_guard should be a LE TLS symbol, I'm coming to the conclusion that my expectation was incorrect, and it is indeed supposed to be a plain offset, so the code is correct, if not as efficient as on PDC.  Does that sound right?

I'm undecided as to whether avoiding the GOT reference, and requiring the symbol to be usable as a link-time constant, would be desirable/doable.  Thoughts?
Comment 16 Uroš Bizjak 2022-07-19 16:44:20 UTC
(In reply to Alexandre Oliva from comment #15)
> Uroš,
> 
> stack-prot-sym.c fails on ia32 with PIC/PIE: the address/value of my_guard
> is loaded from the GOT, instead of appearing as %gs:my_guard.
> 
> After being confused by the expectation that my_guard should be a LE TLS
> symbol, I'm coming to the conclusion that my expectation was incorrect, and
> it is indeed supposed to be a plain offset, so the code is correct, if not
> as efficient as on PDC.  Does that sound right?
> 
> I'm undecided as to whether avoiding the GOT reference, and requiring the
> symbol to be usable as a link-time constant, would be desirable/doable. 
> Thoughts?

The current implementation was implemented specifically for linux, as requested in Comment #0, without any other usages in mind. Also, I'm not deeply familiar with the details of symbol linking, so perhaps HJ can help here.
Comment 17 H.J. Lu 2022-07-19 17:54:42 UTC
(In reply to Alexandre Oliva from comment #15)
> Uroš,
> 
> stack-prot-sym.c fails on ia32 with PIC/PIE: the address/value of my_guard
> is loaded from the GOT, instead of appearing as %gs:my_guard.
> 

When PIC/PIE is used, my_guard is loaded from GOT, regardless of -m32 or -m64.
Comment 18 Alexandre Oliva 2022-07-20 20:46:04 UTC
on x86_64 with -fPIC or -fpic, my_guard's address is indeed loaded from the GOT with @GOTPCREL indeed

on x86_64 with -fPIE or -fpie, however, it is used just as expected by the testcase.  which should be fine as long as my_guard is required to be a link-time defined constant.  

but if it is, then there's no point in loading its address from the GOT, not on x86_64 PIC, not on ia32 PIC/PIE.

thus my question.  something's fishy there.
Comment 19 GCC Commits 2022-08-15 12:22:19 UTC
The master branch has been updated by Alexandre Oliva <aoliva@gcc.gnu.org>:

https://gcc.gnu.org/g:4ad200addc3eaf55fba6cc91db3d3b66eabaf3d0

commit r13-2043-g4ad200addc3eaf55fba6cc91db3d3b66eabaf3d0
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Mon Aug 15 09:21:33 2022 -0300

    i386 PIE: testsuite: cope with default pie on ia32
    
    This patch continues the effort of cleaning up the testsuite for
    --enable-default-pie; the focus herein is mostly 32-bit x86.
    
    As much as I tried to avoid it, most of the changes to the testsuite
    simply disable PIC/PIE, for reasons I'm going to detail below.
    
    static-cdtor1.C gets new patterns to match PIE output.  Some
    avx512fp16 tests change only in register allocation, because of the
    register used to hold the GOT base address.  Interrupt tests changed
    in this regard as well, but here it also affected register saving and
    restoring.
    
    The previous patch modified cet-sjlj tests, mentioning a single regexp
    covering PIC and nonPIC got incorrect match counts.  I found out that
    adding ?: to parenthesized subpatterns avoids miscounting matches.
    Other tests that count certain kinds of insns needed adjustment over
    insns in get_pc_thunk, extra loads from the GOT, or extra adds to
    compute addresses.  In one case, namely stack-check-12, it is nonPIC
    that had extra insns, that PIC gets rid of, or rather, pushing and
    popping the PIC register obviates the dummy push and matching pop used
    for stack probing in nonpic.
    
    pr95126 tests were supposed to optimize loads into known constants,
    but the @GOTOFF addresses prevent that for reasons I have not
    investigated, but that would be clearly desirable, so I've XFAILed
    these.  pr95852 is another case of missed optimization: sibcalls are
    not possible when the PIC register needs to be set up for the call,
    which prevents the expected constant propagation to the return block;
    I have adjusted the codegen expectations of these tests.
    
    As for tests that disable PIE...  Some are judgment calls, that fail
    for similar reasons as tests described above, but I chose not to
    adjust their expectations; others are just not possible with PIC, or
    not worth the effort of adjusting.
    
    anon[14].C check for no global or comdat symbols, respectively, but
    -fPIE outputs get_pc_thunk, as global hidden comdat.
    initlist-const1.C wants .rodata and checks for no .data, but PIC
    outputs constant data that needs relocations in .data.rel.ro.local.
    no-stack-protector-attr-3.C and stackprotectexplicit2.C count
    stack_check_fail matches; -fPIE calls stack_check_fail_local instead,
    which matches the pattern, but this symbol is also marked as .hidden,
    so the match count needs to be adjusted.
    
    pr71694.C checks for no movl, but get_pc_thunk contains one.
    pr102892-1.c is a missed optimization, ivopts creates an induction
    variable because the array address can't be part of an indexing base
    address with PIE, and that ends up stopping a load from being resolved
    to a constant as expected.  sibcall-11.c needs @PLT for the call,
    which requires the PIC register, which makes sibcalling impossible.
    builtin-self.c, in turn, expects no calls, but PIC calls get_pc_thunk.
    
    avx* vector tests that had PIE disabled were affected in that the need
    for GOT-based addressing modes changed instruction selection in ways
    that deviated from the expectations of the tests.  Ditto other vector
    tests: pr100865*, pr101796-1, pr101846, pr101989-broadcast-1, and
    pr102021, pr54855-[37], and pr90773-17.
    
    pr15184* tests need a PIC register to access global variables, which
    affects register allocation, so the patterns would have to be
    adjusted.  pr27971 can't use the expected addressing mode to
    dereference the array with PIC, so it ends up selecting an indexed
    addressing mode, obviating the expected separate shift insn.
    
    pr70263-2 is another case that implicitly expects a sibcall,
    impossible because of the need for the PIC register; without a
    sibcall, the expected REG_EQUIV for the reuse of the stack slot of an
    incoming argument does not occur.  pr78035 duplicates the final
    compare in both then and else blocks with PIE, which deviates from the
    expected cmp count.  pr81736-[57] test for no frame pointer, but the
    PIC register assignment to a call-saved register forces a frame; the
    former ends up not using the PIC register, but it's only optimized out
    after committing to a stack frame to preserve it.  pr85620-6 also
    expects a tail call in a situation that is impossible on ia32 PIC.
    
    pr85667-6 doesn't expect the movl in get_pc_thunk.  pr93492-5 tests
    -mfentry, not available with PIC on ia32.  pr96539 expects a
    tail-call, to avoid copying a large-ish struct argument, but the call
    requires the PIC register, so no tail-call.  stack-prot-sym.c expects
    a nonpic addressing mode.
    
    
    for  gcc/testsuite/ChangeLog
    
            * g++.dg/abi/anon1.C: Disable pie on ia32.
            * g++.dg/abi/anon4.C: Likewise.
            * g++.dg/cpp0x/initlist-const1.C: Likewise.
            * g++.dg/no-stack-protector-attr-3.C: Likewise.
            * g++.dg/stackprotectexplicit2.C: Likewise.
            * g++.dg/pr71694.C: Likewise.
            * gcc.dg/pr102892-1.c: Likewise.
            * gcc.dg/sibcall-11.c: Likewise.
            * gcc.dg/torture/builtin-self.c: Likewise.
            * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Likewise.
            * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Likewise.
            * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
            * gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise.
            * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise.
            * gcc.target/i386/avx512fp16-broadcast-1.c: Likewise.
            * gcc.target/i386/avx512fp16-pr101846.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/pr100865-2.c: Likewise.
            * gcc.target/i386/pr100865-3.c: Likewise.
            * gcc.target/i386/pr100865-4a.c: Likewise.
            * gcc.target/i386/pr100865-4b.c: Likewise.
            * gcc.target/i386/pr100865-5a.c: Likewise.
            * gcc.target/i386/pr100865-5b.c: Likewise.
            * gcc.target/i386/pr100865-6a.c: Likewise.
            * gcc.target/i386/pr100865-6b.c: Likewise.
            * gcc.target/i386/pr100865-6c.c: Likewise.
            * gcc.target/i386/pr100865-7b.c: Likewise.
            * gcc.target/i386/pr101796-1.c: Likewise.
            * gcc.target/i386/pr101846-2.c: Likewise.
            * gcc.target/i386/pr101989-broadcast-1.c: Likewise.
            * gcc.target/i386/pr102021.c: Likewise.
            * gcc.target/i386/pr90773-17.c: Likewise.
            * gcc.target/i386/pr54855-3.c: Likewise.
            * gcc.target/i386/pr54855-7.c: Likewise.
            * gcc.target/i386/pr15184-1.c: Likewise.
            * gcc.target/i386/pr15184-2.c: Likewise.
            * gcc.target/i386/pr27971.c: Likewise.
            * gcc.target/i386/pr70263-2.c: Likewise.
            * gcc.target/i386/pr78035.c: Likewise.
            * gcc.target/i386/pr81736-5.c: Likewise.
            * gcc.target/i386/pr81736-7.c: Likewise.
            * gcc.target/i386/pr85620-6.c: Likewise.
            * gcc.target/i386/pr85667-6.c: Likewise.
            * gcc.target/i386/pr93492-5.c: Likewise.
            * gcc.target/i386/pr96539.c: Likewise.
            PR target/81708 (%gs:my_guard)
            * gcc.target/i386/stack-prot-sym.c: Likewise.
            * g++.dg/init/static-cdtor1.C: Add alternate patterns for PIC.
            * gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: Extend patterns
            for PIC/PIE register allocation.
            * gcc.target/i386/pr100704-3.c: Likewise.
            * gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Likewise.
            * gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: Likewise.
            * gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Likewise.
            * gcc.target/i386/avx512fp16-vmovsh-1a.c: Likewise.
            * gcc.target/i386/interrupt-11.c: Likewise, allowing for
            preservation of the PIC register.
            * gcc.target/i386/interrupt-12.c: Likewise.
            * gcc.target/i386/interrupt-13.c: Likewise.
            * gcc.target/i386/interrupt-15.c: Likewise.
            * gcc.target/i386/interrupt-16.c: Likewise.
            * gcc.target/i386/interrupt-17.c: Likewise.
            * gcc.target/i386/interrupt-8.c: Likewise.
            * gcc.target/i386/cet-sjlj-6a.c: Combine patterns from
            previous change.
            * gcc.target/i386/cet-sjlj-6b.c: Likewise.
            * gcc.target/i386/pad-10.c: Accept insns in get_pc_thunk.
            * gcc.target/i386/pr70321.c: Likewise.
            * gcc.target/i386/pr81563.c: Likewise.
            * gcc.target/i386/pr84278.c: Likewise.
            * gcc.target/i386/pr90773-2.c: Likewise, plus extra loads from
            the GOT.
            * gcc.target/i386/pr90773-3.c: Likewise.
            * gcc.target/i386/pr94913-2.c: Accept additional PIC insns.
            * gcc.target/i386/stack-check-17.c: Likewise.
            * gcc.target/i386/stack-check-12.c: Do not require dummy stack
            probing obviated with PIC.
            * gcc.target/i386/pr95126-m32-1.c: Expect missed optimization
            with PIC.
            * gcc.target/i386/pr95126-m32-2.c: Likewise.
            * gcc.target/i386/pr95852-2.c: Accept different optimization
            with PIC.
            * gcc.target/i386/pr95852-4.c: Likewise.