I'm attaching a reduced reproducer for an issue seen when attempting to build a recent linux kernel with gcc trunk (x86_64 gcc 12). With: "-O2 -fconserve-stack", the attached fails with: $ ~/gcc-install/bin/gcc -S -O2 -fconserve-stack aesni-intel_glue.c In function ‘memset’, inlined from ‘rfc4106_set_hash_subkey.part.0’ at aesni-intel_glue.c:58:3: aesni-intel_glue.c:30:5: error: call to ‘__write_overflow’ declared with attribute error: detected write beyond size of object passed as 1st parame\ ter 30 | __write_overflow(); | ^~~~~~~~~~~~~~~~~~ Looking at -fdump-tree-all, I see that a "void rfc4106_set_hash_subkey.part.0 ()" is created in .050t.fnsplit, containing: void rfc4106_set_hash_subkey.part.0 () { int D.2043; struct crypto_aes_ctx ctx; int ret; size_t p_size; u8 * hash_subkey; const u8 * key; unsigned int key_len; int _1; <bb 3> [local count: 1073741824]: <bb 2> [local count: 1073741824]: __write_overflow (); fortify_panic (&__func__); } and is called by rfc4106_set_hash_subkey. However after .104t.phiprop, the call to rfc4106_set_hash_subkey.part.0 is optimized away, but the function fragment is retained, leading to the build failure in "expand", where at .247t.optimized it has: ;; Function rfc4106_set_hash_subkey.part.0 (rfc4106_set_hash_subkey.part.0, funcdef_no=3, decl_uid=2035, cgraph_uid=14, symbol_order=14) void rfc4106_set_hash_subkey.part.0 () { <bb 2> [local count: 1073741824]: __write_overflow (); fortify_panic (&__func__); } where __write_overflow is marked with __attribute__((__error__)) Reproducer on godbolt.org: https://godbolt.org/z/15oehqjGP Presumably if all calls to a fn "part" are optimized away, that fn part should also be optimized away.
Created attachment 51311 [details] Reduced reproducer
Basically PR 94818
Seems to be a regression relative to gcc 11.2
Yes, this is PR94818. There's always going to be cases like this - it could be RTL optimizations eliding the last call. We're emitting functions in postorder so optimizations there improve analysis on the call side. There's no perfect order and since the error in question is emitted by RTL expansion it's too early anyway. Now, we could heuristically avoid splitting away error paths which would mitigate the situation a bit I guess. Honza?
Other ideas for fixing: (a) (hackish workaround?): defer emitting diagnostics from __attribute__((__error__)) and __attribute__((__warning__)) until a postprocessing stage, after all functions have been emitted, and then determine which functions are actually reachable, and only emit the diagnostics if the functions are callable (b) add an interprocedural sync-up before "final", so that we do an interprocedural elimination of uncallable internal functions before emitting assembler, and move the attribute diagnostics from "expand" to "final".
*** Bug 102361 has been marked as a duplicate of this bug. ***
For the error attribute I think the fix is easier, at least if the functions with error attribute wouldn't be conditional in the to be created fnsplit - just punt on fnsplit in that case. Because if it isn't inlined back, it would be an error and in a successful compilation it should be optimized away and so doesn't need fnsplit. But I think we can't do that for warning attribute, that is just a warning and even when we warn, we'll successfully compile it...
BTW, this one seems to have regressed with r12-2591-g2e96b5f14e4025691b57d2301d71aa6092ed44bc
Bug not assigned - looks like it is considered not important to be able to compile the Linux kernel. Am I correct in the assumption that everybody who wants to build their own kernels, should keep using GCC 11, until further notice?
*** Bug 103242 has been marked as a duplicate of this bug. ***
(In reply to Jakub Jelinek from comment #8) > BTW, this one seems to have regressed with > r12-2591-g2e96b5f14e4025691b57d2301d71aa6092ed44bc There are 2 threads, both in ethread: a.c.034t.ethread: [1] Registering jump thread: (4, 5) incoming edge; (5, 6) normal (6, 7) nocopy; a.c.034t.ethread: [2] Registering jump thread: (4, 6) incoming edge; (6, 8) nocopy; Both are correct. We're basically eliding subsequent checks for the same thing: =========== BB 4 ============ Imports: p_size_14 Exports: p_size_14 <bb 4> : p_size_14 = __builtin_object_size (hash_subkey_8(D), 0); if (p_size_14 <= 15) goto <bb 5>; [33.00%] else goto <bb 6>; [67.00%] 4->5 (T) p_size_14 : size_t [0, 15] 4->6 (F) p_size_14 : size_t [16, +INF] =========== BB 5 ============ p_size_14 size_t [0, 15] <bb 5> : __write_overflow (); =========== BB 6 ============ Imports: p_size_14 Exports: p_size_14 p_size_14 long unsigned int VARYING <bb 6> : if (p_size_14 <= 15) goto <bb 7>; [0.04%] else goto <bb 8>; [99.96%]
(In reply to Aldy Hernandez from comment #11) > (In reply to Jakub Jelinek from comment #8) > > BTW, this one seems to have regressed with > > r12-2591-g2e96b5f14e4025691b57d2301d71aa6092ed44bc > > There are 2 threads, both in ethread: > > a.c.034t.ethread: [1] Registering jump thread: (4, 5) incoming edge; (5, > 6) normal (6, 7) nocopy; > a.c.034t.ethread: [2] Registering jump thread: (4, 6) incoming edge; (6, > 8) nocopy; > > Both are correct. > > We're basically eliding subsequent checks for the same thing: Right, and then function split happens which was not happening before ...
Actually, lookng at the kernel, I don't see how this can happen. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/fortify-string.h uses __FORTIFY_INLINE macro for memset, which is defined in the same file as: #define __FORTIFY_INLINE extern __always_inline __attribute__((gnu_inline)) and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler_attributes.h defines __always_inline as #define __always_inline inline __attribute__((__always_inline__)) fnsplit pass punts on inline functions with __always_inline__ attribute, they need to be wholly inlined always.
(In reply to Jakub Jelinek from comment #13) > Actually, lookng at the kernel, I don't see how this can happen. Inlining is not always the issue here. In the case of the reduced case in PR 103242, function split will happen on non-inline functions with -fconserve-stack. It will split off cold parts into a seperate function to reduce the stack size of the main function. Note the main issue is rather __builtin_object_size does not resolve itself until after function splitting happens while __builtin_constant_p resolve itself after inlining. So we get a jump threading and combining of the two ifs. A few ideas on how to resovle this: * special case function splitting such that a BB that contains a function call which has either warning or error attribute on it; not to split out to a different function. * long term: figure out how to update call graph and not expand functions which are no longer reachable after gimple level optimizations Any other ideas? I might look into the first idea tomorrow or the next day unless someone gets to it before me.
Created attachment 51805 [details] Untested (not even tried to compile) (In reply to Andrew Pinski from comment #14) > * special case function splitting such that a BB that contains a function > call which has either warning or error attribute on it; not to split out to > a different function. Something like this attachment.
(In reply to Andrew Pinski from comment #10) > *** Bug 103242 has been marked as a duplicate of this bug. *** Basic block 4 freq:0.00 size: 1 time:0.00 __write_overflow (); freq:0.00 size: 2 time:0.00 fortify_panic (&__func__); Cannot split: warning or error attribute. found articulation at bb 4 but cannot split
(In reply to Andrew Pinski from comment #6) > *** Bug 102361 has been marked as a duplicate of this bug. *** Basic block 4 freq:0.00 size: 1 time:0.00 __write_overflow (); freq:0.00 size: 2 time:0.00 fortify_panic (&__func__); Cannot split: warning or error attribute. found articulation at bb 4 but cannot split found articulation at bb 3 So basically in this case uncharge_gather_clear is marked as gnu_inline but not always_inline. It is defined as: static inline void uncharge_gather_clear(struct uncharge_gather *ug) Which got expanded as: static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) void uncharge_gather_clear(struct uncharge_gather *ug) Which is due to: https://github.com/torvalds/linux/blob/master/include/linux/compiler_types.h#L142
I will write the testcases tomorrow. The patch works in the most general case.
> > * special case function splitting such that a BB that contains a function > > call which has either warning or error attribute on it; not to split out to > > a different function. > > Something like this attachment. I think we should simply drop those flags after splitting since that preserves the intended semantics? Honza
(In reply to Andrew Pinski from comment #14) > (In reply to Jakub Jelinek from comment #13) > > Actually, lookng at the kernel, I don't see how this can happen. > Inlining is not always the issue here. Ah, so the always_inline memset or whatever got inlined already during einline into some function, but we fnsplit that caller. I was thinking about something like you wrote, but actually more complex, not punt on fnsplit when it contains calls with error attribute, but just note in which bbs they are present, if any bbs have them compute post dominators and when finding splitting points disallow splitting points where any of the marked bbs with error calls post-dominates the splitting point (i.e. where those calls would become unconditional). Because at least for error, that case means we'll fail the compilation, while if we don't split, there is a chance we might compile it correctly. warning attribute or if it is conditional even after splitting could still break. Honza: I don't know what you mean by dropping flags, the only flag we have is an attribute on the called function, it isn't a per-call flag, and in many use cases with error attributes the functions are actually never defined, so ignoring the error attribute at expansion time would be both risky for security reasons (error attribute is usually used in such cases) and could mean it won't link anyway.
Jakub: I see it is about error attributed call in the split out part of function. Then we really want to prevent the split. Keeping track of those should be possible in the recursive walk (where we keep track of SSA names used). Not sure how common it would be in practice though.
(In reply to Jan Hubicka from comment #21) > Jakub: I see it is about error attributed call in the split out part of > function. Then we really want to prevent the split. Keeping track of those > should be possible in the recursive walk (where we keep track of SSA names > used). Not sure how common it would be in practice though. That is exactly what my patch does, we disallow splitting out the basic blocks that lead to the basic block that contains a function call that has an error/warning attribute on it. Here is a testcase which shows that we still able to split out basic blocks that would not lead into the function and all (and we did before my patch too): struct crypto_aes_ctx { char key_dec[128]; }; int rfc4106_set_hash_subkey_hash_subkey; void __write_overflow(void)__attribute__((__error__(""))); void __write_overflow1(void); void aes_encrypt(void*); void fortify_panic(const char*) __attribute__((__noreturn__)) ; char *rfc4106_set_hash_subkey(struct crypto_aes_ctx *ctx) { void *a = &ctx->key_dec[0]; unsigned p_size = __builtin_object_size(a, 0); if (p_size < 16) { __write_overflow1(); fortify_panic(__func__); } if (p_size < 32) { __write_overflow(); fortify_panic(__func__); } aes_encrypt(ctx); return ctx->key_dec; } char *(*gg)(struct crypto_aes_ctx *) = rfc4106_set_hash_subkey; void a(void) { struct crypto_aes_ctx ctx; rfc4106_set_hash_subkey(&ctx); } void b(void) { struct crypto_aes_ctx ctx; ctx.key_dec[0] = 0; rfc4106_set_hash_subkey(&ctx); }
Created attachment 51822 [details] Full patch which I will bootstrap/test This is the full patch and includes two testcases. They both fail before the patch and still pass after the patch. Since the testcase that checks we still do the split is there too.
Patch sent https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584673.html .
Increasing importance from P3 to P1, in that presumably we don't want to ship without being able to compile the Linux kernel.
I've just rebuilt kernel-default package from openSUSE:Factory with the following config: https://gist.githubusercontent.com/marxin/d5373a0dd6ab35233a47a25337e73dc5/raw/d2c810d2d32104619b57b6f1d118d052302c519f/.config and there's one more compilation error for fs/lockd/svclock.c: gcc fs_lockd_svclock.i -c -O2 In file included from ./include/linux/string.h:253, from ./include/linux/bitmap.h:10, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/cpumask.h:5, from ./arch/x86/include/asm/msr.h:11, from ./arch/x86/include/asm/processor.h:22, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:60, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:78, from ./include/linux/spinlock.h:55, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from fs/lockd/svclock.c:25: In function ‘strcpy’, inlined from ‘nlmdbg_cookie2a’ at fs/lockd/svclock.c:74:4, inlined from ‘nlmsvc_lookup_block’ at fs/lockd/svclock.c:157:721: ./include/linux/fortify-string.h:319:17: error: call to ‘__write_overflow’ declared with attribute error: detected write beyond size of object (1st parameter) 319 | __write_overflow(); | ^~~~~~~~~~~~~~~~~~ In function ‘strcpy’, inlined from ‘nlmdbg_cookie2a’ at fs/lockd/svclock.c:74:4, inlined from ‘nlmsvc_find_block’ at fs/lockd/svclock.c:196:672, inlined from ‘nlmsvc_grant_reply’ at fs/lockd/svclock.c:963:16: ./include/linux/fortify-string.h:319:17: error: call to ‘__write_overflow’ declared with attribute error: detected write beyond size of object (1st parameter) 319 | __write_overflow(); | ^~~~~~~~~~~~~~~~~~ that started with r12-6030-g422f9eb7011b76c1.
Created attachment 52179 [details] not reduced test-case
(In reply to Martin Liška from comment #26) > that started with r12-6030-g422f9eb7011b76c1. Please file that bug separately and it might be related to PR 103961 which was just fixed too.
(In reply to Andrew Pinski from comment #28) > (In reply to Martin Liška from comment #26) > > that started with r12-6030-g422f9eb7011b76c1. > > Please file that bug separately and it might be related to PR 103961 which > was just fixed too. It's kinda like PR 103961, but not the same. I'll file a new bug; I've got a reduced reproducer too.
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>: https://gcc.gnu.org/g:76fe494230477a69f8fa8c8ca2d493acaf343eb1 commit r12-6666-g76fe494230477a69f8fa8c8ca2d493acaf343eb1 Author: Andrew Pinski <apinski@marvell.com> Date: Wed Nov 17 02:45:22 2021 +0000 Fix tree-optimization/101941: IPA splitting out function with error attribute The Linux kernel started to fail compile when the jump threader was improved (r12-2591-g2e96b5f14e4025691). This failure was due to the IPA splitting code decided now to split off the basic block which contained two functions, one of those functions included the error attribute on them. This patch fixes the problem by disallowing basic blocks from being split which contain functions that have either the error or warning attribute on them. The two new testcases are to make sure we still split the function for other places if we reject the one case. Committed as approved after Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/101941 gcc/ChangeLog: * ipa-split.cc (visit_bb): Disallow function calls where the function has either error or warning attribute. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr101941-1.c: New test. * gcc.dg/tree-ssa/pr101941-1.c: New test.
Fixed.