Created attachment 43962 [details] Testcase for stack protector spilling guard address When compiling the attached file with -mcpu=cortex-a57 -std=c99 -Os -fpic -fstack-protector-strong the address to the stack gets spilled on the stack where an attacker using buffer overflow could overwrite it and thus control what is the canari checked against: ldr r3, [sp] <--- accessing spilled address of guard from stack mov r0, r4 ldr r2, [sp, #228] ldr r3, [r3] cmp r2, r3 beq .L18 bl __stack_chk_fail(PLT) I can reproduce this on GCC 7 and trunk at least.
This is caused by missing stack_protect_set and stack_protect_test pattern in ARM backend. It would be nice though if the address could be marked such that it doesn't go on the stack to have the default implementation a bit more robust. It might be worth having a warning if the override is not done as well.
(In reply to Thomas Preud'homme from comment #1) > This is caused by missing stack_protect_set and stack_protect_test pattern > in ARM backend. It would be nice though if the address could be marked such > that it doesn't go on the stack to have the default implementation a bit > more robust. It might be worth having a warning if the override is not done > as well. Nope sorry, the address is put in a register before the test pattern is called.
(In reply to Thomas Preud'homme from comment #2) > (In reply to Thomas Preud'homme from comment #1) > > This is caused by missing stack_protect_set and stack_protect_test pattern > > in ARM backend. It would be nice though if the address could be marked such > > that it doesn't go on the stack to have the default implementation a bit > > more robust. It might be worth having a warning if the override is not done > > as well. > > Nope sorry, the address is put in a register before the test pattern is > called. This happens when expanding the tree that holds the guard's address. It's a symbol_ref for which the default expand code of loading into a register is used. This happens also for AArch64 and I suspect for x86 as well. What makes it worse on ARM is that cse_local sees 2 SET instructions computing the address of the guard and reuse the register being set in the first instruction instead of recomputing again. Because of the distance between that first SET and the comparison between guard and canari the chances of getting the address spilled are higher. This does not happen for AArch64 because the mov of symbol_ref generates an UNSPEC of a memory address whereas ARM generates a MEM of an UNSPEC symbol_ref. However I suppose with scheduling it's possible for the set of guard address and following test of guard against canari to be moved apart and spill to happen in theory. My feeling is that the target patterns should also do the address computation, ie stack_protect_set and stack_protect_test would take that MEM of symbol_ref instead of expanding it first. Thoughts?
Clearly rematerialization isn't working correctly. Immediates and constant addresses like this should never be spilled (using MOV/MOVK could increase codesize, but with -Os you should use the literal pool anyway). Check legitimate_constant_p returns true, see https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00052.html for how it's done on AArch64.
(In reply to Wilco from comment #4) > Clearly rematerialization isn't working correctly. Immediates and constant > addresses like this should never be spilled (using MOV/MOVK could increase > codesize, but with -Os you should use the literal pool anyway). Check > legitimate_constant_p returns true, see > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00052.html for how it's done > on AArch64. By the time LRA happens, this is what LRA sees of the address computation: (insn 321 6 322 2 (set (reg:SI 274) (unspec:SI [ (symbol_ref:SI ("__stack_chk_guard") [flags 0xc0] <var_decl 0x7f689803a2d0 __stack_chk_guard>) ] UNSPEC_PIC_SYM)) "test_arm_sp_fpic.c":41 181 {pic_load_addr_32bit} (expr_list:REG_EQUIV (unspec:SI [ (symbol_ref:SI ("__stack_chk_guard") [flags 0xc0] <var_decl 0x7f689803a2d0 __stack_chk_guard>) ] UNSPEC_PIC_SYM) (nil))) (insn 322 321 11 2 (set (reg/f:SI 175) (mem:SI (plus:SI (reg:SI 176) (reg:SI 274)) [0 S4 A32])) "test_arm_sp_fpic.c":41 876 {*thumb2_movsi_insn} (expr_list:REG_DEAD (reg:SI 274) (expr_list:REG_DEAD (reg:SI 176) (expr_list:REG_EQUIV (symbol_ref:SI ("__stack_chk_guard") [flags 0xc0] <var_decl 0x7f689803a2d0 __stack_chk_guard>) (nil))))) It is this 175 which is spilled because LRA sees it doesn't have a register of the right class to allocate that pseudo. Because it's a MEM it doesn't see it as a constant expression and thus does not rematerialize it.
(In reply to Thomas Preud'homme from comment #3) > > My feeling is that the target patterns should also do the address > computation, ie stack_protect_set and stack_protect_test would take that MEM > of symbol_ref instead of expanding it first. The more I think about it the more I think it's the only way to guarantee the guard's address does not end up in a register that could be spilled. The spilling itself is more likely to happen when reading the GOT entry that holds the guard's address is not represented by an UNSPEC because then the address computed when setting the canari is reused when doing the comparison. UNSPEC seems to prevent that (even though it's not UNSPEC_VOLATILE).
(In reply to Thomas Preud'homme from comment #6) > (In reply to Thomas Preud'homme from comment #3) > > > > My feeling is that the target patterns should also do the address > > computation, ie stack_protect_set and stack_protect_test would take that MEM > > of symbol_ref instead of expanding it first. > > The more I think about it the more I think it's the only way to guarantee > the guard's address does not end up in a register that could be spilled. The > spilling itself is more likely to happen when reading the GOT entry that > holds the guard's address is not represented by an UNSPEC because then the > address computed when setting the canari is reused when doing the > comparison. UNSPEC seems to prevent that (even though it's not > UNSPEC_VOLATILE). I'm experimenting with a patch to mark the MEM to access the GOT entry as volatile in ARM backend in order to prevent CSE. It works on this PR's testcase so will give bootstrap and regression testing a try. As I said, this doesn't fully solve the underlying issue IMO but together with implementation of stack_protect_set and stack_protect_test in ARM it should make stack protector as reliable on ARM targets as on AArch64.
(In reply to Thomas Preud'homme from comment #7) > (In reply to Thomas Preud'homme from comment #6) > > (In reply to Thomas Preud'homme from comment #3) > > > > > > My feeling is that the target patterns should also do the address > > > computation, ie stack_protect_set and stack_protect_test would take that MEM > > > of symbol_ref instead of expanding it first. > > > > The more I think about it the more I think it's the only way to guarantee > > the guard's address does not end up in a register that could be spilled. The > > spilling itself is more likely to happen when reading the GOT entry that > > holds the guard's address is not represented by an UNSPEC because then the > > address computed when setting the canari is reused when doing the > > comparison. UNSPEC seems to prevent that (even though it's not > > UNSPEC_VOLATILE). > > I'm experimenting with a patch to mark the MEM to access the GOT entry as > volatile in ARM backend in order to prevent CSE. It works on this PR's > testcase so will give bootstrap and regression testing a try. As I said, > this doesn't fully solve the underlying issue IMO but together with > implementation of stack_protect_set and stack_protect_test in ARM it should > make stack protector as reliable on ARM targets as on AArch64. Found out the approach needs further changes for Thumb-1 because a PIC access is done in several instructions. I'm addressing those at the moment.
Managed to reach a state where nothing is spilled on the stack for Thumb-1 either. I want to do 3 more changes before I start full testing: - put some compiler barrier between address computation and canari setting / canari check to ensure no instruction is scheduled between the two - clarify in documentation that it's the target's responsability to ensure guard's address computation is not CSE (in particular for PIC access) - tighten scan pattern for testcase a bit more
Patch posted: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01264.html
I've started to work on a new patch according to review feedbacks. I've reached the stage where I can compile without -fPIC with the stack protect test being an UNSPEC split after register allocation as suggested. Next steps are: 1) do the same for the stack protect set (ie setting the canari) 2) add support for PIC access to the guard 3) include the conditional branch in the combined stack protect test to ensure the register holding the result of the comparison is not spilled before it's used for the conditional branch 4) clear all registers involved before branching 5) cleanup the patch
(In reply to Thomas Preud'homme from comment #11) > I've started to work on a new patch according to review feedbacks. I've > reached the stage where I can compile without -fPIC with the stack protect > test being an UNSPEC split after register allocation as suggested. > > Next steps are: > > 1) do the same for the stack protect set (ie setting the canari) Done > 3) include the conditional branch in the combined stack protect test to > ensure the register holding the result of the comparison is not spilled > before it's used for the conditional branch Done > 5) cleanup the patch In progress > 2) add support for PIC access to the guard > 4) clear all registers involved before branching TODO.
Remains now: 1) add support for PIC access to the guard 2) finish cleanup of the patch
(In reply to Thomas Preud'homme from comment #13) > Remains now: > > 1) add support for PIC access to the guard > 2) finish cleanup of the patch Except for a few missing comments, it's all there. I'll start testing now.
(In reply to Thomas Preud'homme from comment #14) > (In reply to Thomas Preud'homme from comment #13) > > Remains now: > > > > 1) add support for PIC access to the guard > > 2) finish cleanup of the patch > > Except for a few missing comments, it's all there. I'll start testing now. I'm currently working on fixing the ICE found during testing.
Adding CVE number
Patch has been posted for review: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00246.html
Author: thopre01 Date: Thu Aug 2 09:07:17 2018 New Revision: 263245 URL: https://gcc.gnu.org/viewcvs?rev=263245&root=gcc&view=rev Log: [ARM] Fix PR85434: spilling of stack protector guard's address on ARM In case of high register pressure in PIC mode, address of the stack protector's guard can be spilled on ARM targets as shown in PR85434, thus allowing an attacker to control what the canary would be compared against. This is also known as CVE-2018-12886. ARM does lack stack_protect_set and stack_protect_test insn patterns, defining them does not help as the address is expanded regularly and the patterns only deal with the copy and test of the guard with the canary. This problem does not occur for x86 targets because the PIC access and the test can be done in the same instruction. Aarch64 is exempt too because PIC access insn pattern are mov of UNSPEC which prevents it from the second access in the epilogue being CSEd in cse_local pass with the first access in the prologue. The approach followed here is to create new "combined" set and test standard pattern names that take the unexpanded guard and do the set or test. This allows the target to use an opaque pattern (eg. using UNSPEC) to hide the individual instructions being generated to the compiler and split the pattern into generic load, compare and branch instruction after register allocator, therefore avoiding any spilling. This is here implemented for the ARM targets. For targets not implementing these new standard pattern names, the existing stack_protect_set and stack_protect_test pattern names are used. To be able to split PIC access after register allocation, the functions had to be augmented to force a new PIC register load and to control which register it loads into. This is because sharing the PIC register between prologue and epilogue could lead to spilling due to CSE again which an attacker could use to control what the canary gets compared against. 2018-08-02 Thomas Preud'homme <thomas.preudhomme@linaro.org> gcc/ PR target/85434 * target-insns.def (stack_protect_combined_set): Define new standard pattern name. (stack_protect_combined_test): Likewise. * cfgexpand.c (stack_protect_prologue): Try new stack_protect_combined_set pattern first. * function.c (stack_protect_epilogue): Try new stack_protect_combined_test pattern first. * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now parameters to control which register to use as PIC register and force reloading PIC register respectively. Insert in the stream of insns if possible. (legitimize_pic_address): Expose above new parameters in prototype and adapt recursive calls accordingly. (arm_legitimize_address): Adapt to new legitimize_pic_address prototype. (thumb_legitimize_address): Likewise. (arm_emit_call_insn): Adapt to new require_pic_register prototype. * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype change. * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address prototype change. (stack_protect_combined_set): New insn_and_split pattern. (stack_protect_set): New insn pattern. (stack_protect_combined_test): New insn_and_split pattern. (stack_protect_test): New insn pattern. * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. (UNSPEC_SP_TEST): Likewise. * doc/md.texi (stack_protect_combined_set): Document new standard pattern name. (stack_protect_set): Clarify that the operand for guard's address is legal. (stack_protect_combined_test): Document new standard pattern name. (stack_protect_test): Clarify that the operand for guard's address is legal. gcc/testsuite/ PR target/85434 * gcc.target/arm/pr85434.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/pr85434.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/config/arm/arm-protos.h trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.md trunk/gcc/config/arm/unspecs.md trunk/gcc/doc/md.texi trunk/gcc/function.c trunk/gcc/target-insns.def trunk/gcc/testsuite/ChangeLog
Created attachment 44489 [details] Source file causing ICE on aarch64 With your patch, GCC crashes with target aarch64-none-linux-gnu aarch64-none-linux-gnu-gcc gethnamaddr.i -fstack-protector during RTL pass: expand gethnamaddr.c: In function 'getanswer': gethnamaddr.c:179:1: internal compiler error: in maybe_gen_insn, at optabs.c:7307 getanswer (const querybuf *answer, int anslen, const char *qname, int qtype) ^~~~~~~~~ 0xafcef2 maybe_gen_insn(insn_code, unsigned int, expand_operand*) /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/optabs.c:7307 0xaffb88 maybe_expand_insn(insn_code, unsigned int, expand_operand*) /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/optabs.c:7351 0x7748a0 stack_protect_prologue /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6117 0x7748a0 execute /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6357 Please submit a full bug report,
(In reply to Christophe Lyon from comment #19) > Created attachment 44489 [details] > Source file causing ICE on aarch64 > > With your patch, GCC crashes with target aarch64-none-linux-gnu > aarch64-none-linux-gnu-gcc gethnamaddr.i -fstack-protector > > during RTL pass: expand > gethnamaddr.c: In function 'getanswer': > gethnamaddr.c:179:1: internal compiler error: in maybe_gen_insn, at > optabs.c:7307 > getanswer (const querybuf *answer, int anslen, const char *qname, int qtype) > ^~~~~~~~~ > 0xafcef2 maybe_gen_insn(insn_code, unsigned int, expand_operand*) > /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/optabs.c:7307 > 0xaffb88 maybe_expand_insn(insn_code, unsigned int, expand_operand*) > /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/optabs.c:7351 > 0x7748a0 stack_protect_prologue > > /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6117 > 0x7748a0 execute > > /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6357 > Please submit a full bug report, Thanks Christophe, It seems to have impacted x86 as well. I'll look at all those and respin the patch. I've reverted it in the meantime.
Author: thopre01 Date: Thu Nov 22 14:46:17 2018 New Revision: 266379 URL: https://gcc.gnu.org/viewcvs?rev=266379&root=gcc&view=rev Log: PR85434: Prevent spilling of stack protector guard's address on ARM In case of high register pressure in PIC mode, address of the stack protector's guard can be spilled on ARM targets as shown in PR85434, thus allowing an attacker to control what the canary would be compared against. ARM does lack stack_protect_set and stack_protect_test insn patterns, defining them does not help as the address is expanded regularly and the patterns only deal with the copy and test of the guard with the canary. This problem does not occur for x86 targets because the PIC access and the test can be done in the same instruction. Aarch64 is exempt too because PIC access insn pattern are mov of UNSPEC which prevents it from the second access in the epilogue being CSEd in cse_local pass with the first access in the prologue. The approach followed here is to create new "combined" set and test standard pattern names that take the unexpanded guard and do the set or test. This allows the target to use an opaque pattern (eg. using UNSPEC) to hide the individual instructions being generated to the compiler and split the pattern into generic load, compare and branch instruction after register allocator, therefore avoiding any spilling. This is here implemented for the ARM targets. For targets not implementing these new standard pattern names, the existing stack_protect_set and stack_protect_test pattern names are used. To be able to split PIC access after register allocation, the functions had to be augmented to force a new PIC register load and to control which register it loads into. This is because sharing the PIC register between prologue and epilogue could lead to spilling due to CSE again which an attacker could use to control what the canary gets compared against. 2018-11-22 Thomas Preud'homme <thomas.preudhomme@linaro.org> gcc/ PR target/85434 * target-insns.def (stack_protect_combined_set): Define new standard pattern name. (stack_protect_combined_test): Likewise. * cfgexpand.c (stack_protect_prologue): Try new stack_protect_combined_set pattern first. * function.c (stack_protect_epilogue): Try new stack_protect_combined_test pattern first. * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now parameters to control which register to use as PIC register and force reloading PIC register respectively. Insert in the stream of insns if possible. (legitimize_pic_address): Expose above new parameters in prototype and adapt recursive calls accordingly. Use pic_reg if non null instead of cached one. (arm_load_pic_register): Add pic_reg parameter and use it if non null. (arm_legitimize_address): Adapt to new legitimize_pic_address prototype. (thumb_legitimize_address): Likewise. (arm_emit_call_insn): Adapt to require_pic_register prototype change. (arm_expand_prologue): Adapt to arm_load_pic_register prototype change. (thumb1_expand_prologue): Likewise. * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype change. (arm_load_pic_register): Likewise. * config/arm/predicated.md (guard_addr_operand): New predicate. (guard_operand): New predicate. * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address prototype change. (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue prototype change. (stack_protect_combined_set): New expander.. (stack_protect_combined_set_insn): New insn_and_split pattern. (stack_protect_set_insn): New insn pattern. (stack_protect_combined_test): New expander. (stack_protect_combined_test_insn): New insn_and_split pattern. (arm_stack_protect_test_insn): New insn pattern. * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern. * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. (UNSPEC_SP_TEST): Likewise. * doc/md.texi (stack_protect_combined_set): Document new standard pattern name. (stack_protect_set): Clarify that the operand for guard's address is legal. (stack_protect_combined_test): Document new standard pattern name. (stack_protect_test): Clarify that the operand for guard's address is legal. gcc/testsuite/ PR target/85434 * gcc.target/arm/pr85434.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/pr85434.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/config/arm/arm-protos.h trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.md trunk/gcc/config/arm/predicates.md trunk/gcc/config/arm/thumb1.md trunk/gcc/config/arm/unspecs.md trunk/gcc/doc/md.texi trunk/gcc/function.c trunk/gcc/target-insns.def trunk/gcc/testsuite/ChangeLog
So this was fixed for GCC 9. Both the GCC 8 and GCC 7 branches are still open for wrong-code fixes, I'd appreciate at least providing "official" backports of the fix even if we end up not committing them because for the fear of breakage. You probably know best - was there any fallout of the two revs committed here that needed to be fixed? Thanks.
(In reply to Richard Biener from comment #22) > So this was fixed for GCC 9. Both the GCC 8 and GCC 7 branches are still > open for wrong-code fixes, I'd appreciate at least providing "official" > backports of the fix even if we end up not committing them because for the > fear of breakage. > You probably know best - was there any fallout of the two revs committed here > that needed to be fixed? > > Thanks. I don't remember any fallout from the final patch no. My reticence for a backport was due to * adding a new target hook * patching functions to generate PIC sequences. I'm not sure how much these functions have changed over the year, I suspect not much and the rest of the patch should be fairly self contained. I would not expect too much work to backport the change itself by someone else (I am not in a position to do the work myself), but whether such a change would be acceptable on a stable branch is up to the relevant maintainer.
A backport to earlier releases is still not available, and the process of backporting is proving to require understanding of the changes done in major releases. It would be very much appreciated if a backport is provided.
(In reply to Khalid Gomaa from comment #24) > A backport to earlier releases is still not available, and the process of > backporting is proving to require understanding of the changes done in major > releases. It would be very much appreciated if a backport is provided. Except the releases where the backport would have happened are no longer getting updates. GCC 7 has not been getting updates since GCC 7.5 was released on November 14, 2019 and GCC 8 have not been getting updates since GCC 8.5.0 was released on May 14, 2021. So closing as fixed. If you want a backport, you either need to do it yourself or have a distro do it.