Bug 101941 - [12 Regression] Linux kernel build failure due to retaining fnsplit fragment with __attribute__((__error__))
Summary: [12 Regression] Linux kernel build failure due to retaining fnsplit fragment ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 12.0
: P1 normal
Target Milestone: 12.0
Assignee: Andrew Pinski
URL: https://gcc.gnu.org/pipermail/gcc-pat...
Keywords: missed-optimization, patch
: 102361 103242 (view as bug list)
Depends on: 94818
Blocks: 102361
  Show dependency treegraph
 
Reported: 2021-08-16 22:36 UTC by David Malcolm
Modified: 2022-05-25 09:29 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-08-17 00:00:00


Attachments
Reduced reproducer (741 bytes, text/plain)
2021-08-16 22:36 UTC, David Malcolm
Details
Untested (not even tried to compile) (549 bytes, patch)
2021-11-16 05:07 UTC, Andrew Pinski
Details | Diff
Full patch which I will bootstrap/test (1.76 KB, patch)
2021-11-17 03:10 UTC, Andrew Pinski
Details | Diff
not reduced test-case (399.22 KB, application/zstd)
2022-01-13 14:24 UTC, Martin Liška
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Malcolm 2021-08-16 22:36:23 UTC
I'm attaching a reduced reproducer for an issue seen when attempting to build a recent linux kernel with gcc trunk (x86_64 gcc 12).

With: "-O2 -fconserve-stack", the attached fails with:

$ ~/gcc-install/bin/gcc -S -O2 -fconserve-stack aesni-intel_glue.c
In function ‘memset’,
    inlined from ‘rfc4106_set_hash_subkey.part.0’ at aesni-intel_glue.c:58:3:
aesni-intel_glue.c:30:5: error: call to ‘__write_overflow’ declared with attribute error: detected write beyond size of object passed as 1st parame\
ter
   30 |     __write_overflow();
      |     ^~~~~~~~~~~~~~~~~~

Looking at -fdump-tree-all, I see that a "void rfc4106_set_hash_subkey.part.0 ()" is created in .050t.fnsplit, containing:

void rfc4106_set_hash_subkey.part.0 ()
{
  int D.2043;
  struct crypto_aes_ctx ctx;
  int ret;
  size_t p_size;
  u8 * hash_subkey;
  const u8 * key;
  unsigned int key_len;
  int _1;

  <bb 3> [local count: 1073741824]:

  <bb 2> [local count: 1073741824]:
  __write_overflow ();
  fortify_panic (&__func__);

}

and is called by rfc4106_set_hash_subkey.

However after .104t.phiprop, the call to rfc4106_set_hash_subkey.part.0 is optimized away, but the function fragment is retained, leading to the build failure in "expand", where at .247t.optimized it has:

;; Function rfc4106_set_hash_subkey.part.0 (rfc4106_set_hash_subkey.part.0, funcdef_no=3, decl_uid=2035, cgraph_uid=14, symbol_order=14)

void rfc4106_set_hash_subkey.part.0 ()                                                                                                              
{                                                                                                                                                   
  <bb 2> [local count: 1073741824]:
  __write_overflow ();
  fortify_panic (&__func__);                                                                                                                        
}                                                                                                                                                   

where __write_overflow is marked with __attribute__((__error__))

Reproducer on godbolt.org:
  https://godbolt.org/z/15oehqjGP

Presumably if all calls to a fn "part" are optimized away, that fn part should also be optimized away.
Comment 1 David Malcolm 2021-08-16 22:36:46 UTC
Created attachment 51311 [details]
Reduced reproducer
Comment 2 Andrew Pinski 2021-08-16 22:50:43 UTC
Basically PR 94818
Comment 3 David Malcolm 2021-08-16 23:02:40 UTC
Seems to be a regression relative to gcc 11.2
Comment 4 Richard Biener 2021-08-17 07:56:35 UTC
Yes, this is PR94818.  There's always going to be cases like this - it could be RTL optimizations eliding the last call.  We're emitting functions in postorder so optimizations there improve analysis on the call side.  There's no perfect
order and since the error in question is emitted by RTL expansion it's too early anyway.

Now, we could heuristically avoid splitting away error paths which would
mitigate the situation a bit I guess.  Honza?
Comment 5 David Malcolm 2021-08-18 13:47:28 UTC
Other ideas for fixing:

(a) (hackish workaround?): defer emitting diagnostics from __attribute__((__error__)) and __attribute__((__warning__)) until a postprocessing stage, after all functions have been emitted, and then determine which functions are actually reachable, and only emit the diagnostics if the functions are callable

(b) add an interprocedural sync-up before "final", so that we do an interprocedural elimination of uncallable internal functions before emitting assembler, and move the attribute diagnostics from "expand" to "final".
Comment 6 Andrew Pinski 2021-09-16 23:00:18 UTC
*** Bug 102361 has been marked as a duplicate of this bug. ***
Comment 7 Jakub Jelinek 2021-09-17 08:47:19 UTC
For the error attribute I think the fix is easier, at least if the functions with error attribute wouldn't be conditional in the to be created fnsplit - just punt on fnsplit in that case.  Because if it isn't inlined back, it would be an error and in a successful compilation it should be optimized away and so doesn't need fnsplit.
But I think we can't do that for warning attribute, that is just a warning and even when we warn, we'll successfully compile it...
Comment 8 Jakub Jelinek 2021-09-17 08:53:25 UTC
BTW, this one seems to have regressed with r12-2591-g2e96b5f14e4025691b57d2301d71aa6092ed44bc
Comment 9 DAC324 2021-09-20 08:18:50 UTC
Bug not assigned - looks like it is considered not important to be able to compile the Linux kernel.

Am I correct in the assumption that everybody who wants to build their own kernels, should keep using GCC 11, until further notice?
Comment 10 Andrew Pinski 2021-11-15 09:37:42 UTC
*** Bug 103242 has been marked as a duplicate of this bug. ***
Comment 11 Aldy Hernandez 2021-11-15 09:43:15 UTC
(In reply to Jakub Jelinek from comment #8)
> BTW, this one seems to have regressed with
> r12-2591-g2e96b5f14e4025691b57d2301d71aa6092ed44bc

There are 2 threads, both in ethread:

a.c.034t.ethread:  [1] Registering jump thread: (4, 5) incoming edge;  (5, 6) normal (6, 7) nocopy; 
a.c.034t.ethread:  [2] Registering jump thread: (4, 6) incoming edge;  (6, 8) nocopy; 

Both are correct.

We're basically eliding subsequent checks for the same thing:

=========== BB 4 ============
Imports: p_size_14  
Exports: p_size_14  
    <bb 4> :
    p_size_14 = __builtin_object_size (hash_subkey_8(D), 0);
    if (p_size_14 <= 15)
      goto <bb 5>; [33.00%]
    else
      goto <bb 6>; [67.00%]

4->5  (T) p_size_14 : 	size_t [0, 15]
4->6  (F) p_size_14 : 	size_t [16, +INF]

=========== BB 5 ============
p_size_14	size_t [0, 15]
    <bb 5> :
    __write_overflow ();


=========== BB 6 ============
Imports: p_size_14  
Exports: p_size_14  
p_size_14	long unsigned int VARYING
    <bb 6> :
    if (p_size_14 <= 15)
      goto <bb 7>; [0.04%]
    else
      goto <bb 8>; [99.96%]
Comment 12 Andrew Pinski 2021-11-15 10:00:43 UTC
(In reply to Aldy Hernandez from comment #11)
> (In reply to Jakub Jelinek from comment #8)
> > BTW, this one seems to have regressed with
> > r12-2591-g2e96b5f14e4025691b57d2301d71aa6092ed44bc
> 
> There are 2 threads, both in ethread:
> 
> a.c.034t.ethread:  [1] Registering jump thread: (4, 5) incoming edge;  (5,
> 6) normal (6, 7) nocopy; 
> a.c.034t.ethread:  [2] Registering jump thread: (4, 6) incoming edge;  (6,
> 8) nocopy; 
> 
> Both are correct.
> 
> We're basically eliding subsequent checks for the same thing:

Right, and then function split happens which was not happening before ...
Comment 13 Jakub Jelinek 2021-11-15 14:38:09 UTC
Actually, lookng at the kernel, I don't see how this can happen.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/fortify-string.h
uses __FORTIFY_INLINE macro for memset, which is defined in the same file as:
#define __FORTIFY_INLINE extern __always_inline __attribute__((gnu_inline))
and
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler_attributes.h
defines __always_inline as
#define __always_inline                 inline __attribute__((__always_inline__))
fnsplit pass punts on inline functions with __always_inline__ attribute, they need to be wholly inlined always.
Comment 14 Andrew Pinski 2021-11-16 04:24:34 UTC
(In reply to Jakub Jelinek from comment #13)
> Actually, lookng at the kernel, I don't see how this can happen.
Inlining is not always the issue here.

In the case of the reduced case in PR 103242, function split will happen on non-inline functions with -fconserve-stack. It will split off cold parts into a seperate function to reduce the stack size of the main function.

Note the main issue is rather __builtin_object_size does not resolve itself until after function splitting happens while __builtin_constant_p resolve itself after inlining. So we get a jump threading and combining of the two ifs.

A few ideas on how to resovle this:
* special case function splitting such that a BB that contains a function call which has either warning or error attribute on it; not to split out to a different function.
* long term: figure out how to update call graph and not expand functions which are no longer reachable after gimple level optimizations

Any other ideas?

I might look into the first idea tomorrow or the next day unless someone gets to it before me.
Comment 15 Andrew Pinski 2021-11-16 05:07:21 UTC
Created attachment 51805 [details]
Untested (not even tried to compile)

(In reply to Andrew Pinski from comment #14)
> * special case function splitting such that a BB that contains a function
> call which has either warning or error attribute on it; not to split out to
> a different function.

Something like this attachment.
Comment 16 Andrew Pinski 2021-11-16 05:59:39 UTC
(In reply to Andrew Pinski from comment #10)
> *** Bug 103242 has been marked as a duplicate of this bug. ***

Basic block 4
  freq:0.00 size:  1 time:0.00 __write_overflow ();
  freq:0.00 size:  2 time:0.00 fortify_panic (&__func__);
Cannot split: warning or error attribute.
found articulation at bb 4 but cannot split
Comment 17 Andrew Pinski 2021-11-16 06:14:46 UTC
(In reply to Andrew Pinski from comment #6)
> *** Bug 102361 has been marked as a duplicate of this bug. ***

Basic block 4
  freq:0.00 size:  1 time:0.00 __write_overflow ();
  freq:0.00 size:  2 time:0.00 fortify_panic (&__func__);
Cannot split: warning or error attribute.
found articulation at bb 4 but cannot split
found articulation at bb 3

So basically in this case uncharge_gather_clear is marked as gnu_inline but not always_inline.
It is defined as:
static inline void uncharge_gather_clear(struct uncharge_gather *ug)
Which got expanded as:
static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) void uncharge_gather_clear(struct uncharge_gather *ug)

Which is due to:
https://github.com/torvalds/linux/blob/master/include/linux/compiler_types.h#L142
Comment 18 Andrew Pinski 2021-11-16 06:24:32 UTC
I will write the testcases tomorrow. The patch works in the most general case.
Comment 19 hubicka 2021-11-16 10:04:28 UTC
> > * special case function splitting such that a BB that contains a function
> > call which has either warning or error attribute on it; not to split out to
> > a different function.
> 
> Something like this attachment.

I think we should simply drop those flags after splitting since that
preserves the intended semantics?

Honza
Comment 20 Jakub Jelinek 2021-11-16 10:23:50 UTC
(In reply to Andrew Pinski from comment #14)
> (In reply to Jakub Jelinek from comment #13)
> > Actually, lookng at the kernel, I don't see how this can happen.
> Inlining is not always the issue here.

Ah, so the always_inline memset or whatever got inlined already during einline into some function, but we fnsplit that caller.

I was thinking about something like you wrote, but actually more complex, not punt on fnsplit when it contains calls with error attribute, but just note in which bbs they are present, if any bbs have them compute post dominators and when finding splitting points disallow splitting points where any of the marked bbs with error calls post-dominates the splitting point (i.e. where those calls would become unconditional).  Because at least for error, that case means we'll fail the compilation, while if we don't split, there is a chance we might compile it correctly.  warning attribute or if it is conditional even after splitting could still break.

Honza: I don't know what you mean by dropping flags, the only flag we have is an attribute on the called function, it isn't a per-call flag, and in many use cases  with error attributes the functions are actually never defined, so ignoring the error attribute at expansion time would be both risky for security reasons (error attribute is usually used in such cases) and could mean it won't link anyway.
Comment 21 Jan Hubicka 2021-11-16 10:41:22 UTC
Jakub: I see it is about error attributed call in the split out part of function. Then we really want to prevent the split. Keeping track of those should be possible in the recursive walk (where we keep track of SSA names used). Not sure how common it would be in practice though.
Comment 22 Andrew Pinski 2021-11-17 01:12:19 UTC
(In reply to Jan Hubicka from comment #21)
> Jakub: I see it is about error attributed call in the split out part of
> function. Then we really want to prevent the split. Keeping track of those
> should be possible in the recursive walk (where we keep track of SSA names
> used). Not sure how common it would be in practice though.

That is exactly what my patch does, we disallow splitting out the basic blocks that lead to the basic block that contains a function call that has an error/warning attribute on it.

Here is a testcase which shows that we still able to split out basic blocks that would not lead into the function and all (and we did before my patch too):
struct crypto_aes_ctx {
  char key_dec[128];
};
int rfc4106_set_hash_subkey_hash_subkey;
void __write_overflow(void)__attribute__((__error__("")));
void __write_overflow1(void);
void aes_encrypt(void*);
void fortify_panic(const char*) __attribute__((__noreturn__)) ;
char *rfc4106_set_hash_subkey(struct crypto_aes_ctx *ctx) {
  void *a = &ctx->key_dec[0];
  unsigned p_size =  __builtin_object_size(a, 0);
  if (p_size < 16) {
    __write_overflow1();
    fortify_panic(__func__);
  }
  if (p_size < 32) {
    __write_overflow();
    fortify_panic(__func__);
  }
  aes_encrypt(ctx);
  return ctx->key_dec;
}
char *(*gg)(struct crypto_aes_ctx *) = rfc4106_set_hash_subkey;
void a(void)
{
  struct crypto_aes_ctx ctx;
  rfc4106_set_hash_subkey(&ctx);
}
void b(void)
{
  struct crypto_aes_ctx ctx;
  ctx.key_dec[0] = 0;
  rfc4106_set_hash_subkey(&ctx);
}
Comment 23 Andrew Pinski 2021-11-17 03:10:57 UTC
Created attachment 51822 [details]
Full patch which I will bootstrap/test

This is the full patch and includes two testcases. They both fail before the patch and still pass after the patch.
Since the testcase that checks we still do the split is there too.
Comment 24 Andrew Pinski 2021-11-17 06:27:50 UTC
Patch sent https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584673.html .
Comment 25 David Malcolm 2022-01-12 14:17:06 UTC
Increasing importance from P3 to P1, in that presumably we don't want to ship without being able to compile the Linux kernel.
Comment 26 Martin Liška 2022-01-13 14:23:16 UTC
I've just rebuilt kernel-default package from openSUSE:Factory with the following config:
https://gist.githubusercontent.com/marxin/d5373a0dd6ab35233a47a25337e73dc5/raw/d2c810d2d32104619b57b6f1d118d052302c519f/.config

and there's one more compilation error for fs/lockd/svclock.c:

gcc fs_lockd_svclock.i -c -O2
In file included from ./include/linux/string.h:253,
                 from ./include/linux/bitmap.h:10,
                 from ./include/linux/cpumask.h:12,
                 from ./arch/x86/include/asm/cpumask.h:5,
                 from ./arch/x86/include/asm/msr.h:11,
                 from ./arch/x86/include/asm/processor.h:22,
                 from ./arch/x86/include/asm/cpufeature.h:5,
                 from ./arch/x86/include/asm/thread_info.h:53,
                 from ./include/linux/thread_info.h:60,
                 from ./arch/x86/include/asm/preempt.h:7,
                 from ./include/linux/preempt.h:78,
                 from ./include/linux/spinlock.h:55,
                 from ./include/linux/mmzone.h:8,
                 from ./include/linux/gfp.h:6,
                 from ./include/linux/slab.h:15,
                 from fs/lockd/svclock.c:25:
In function ‘strcpy’,
    inlined from ‘nlmdbg_cookie2a’ at fs/lockd/svclock.c:74:4,
    inlined from ‘nlmsvc_lookup_block’ at fs/lockd/svclock.c:157:721:
./include/linux/fortify-string.h:319:17: error: call to ‘__write_overflow’ declared with attribute error: detected write beyond size of object (1st parameter)
  319 |                 __write_overflow();
      |                 ^~~~~~~~~~~~~~~~~~
In function ‘strcpy’,
    inlined from ‘nlmdbg_cookie2a’ at fs/lockd/svclock.c:74:4,
    inlined from ‘nlmsvc_find_block’ at fs/lockd/svclock.c:196:672,
    inlined from ‘nlmsvc_grant_reply’ at fs/lockd/svclock.c:963:16:
./include/linux/fortify-string.h:319:17: error: call to ‘__write_overflow’ declared with attribute error: detected write beyond size of object (1st parameter)
  319 |                 __write_overflow();
      |                 ^~~~~~~~~~~~~~~~~~

that started with r12-6030-g422f9eb7011b76c1.
Comment 27 Martin Liška 2022-01-13 14:24:01 UTC
Created attachment 52179 [details]
not reduced test-case
Comment 28 Andrew Pinski 2022-01-13 14:29:16 UTC
(In reply to Martin Liška from comment #26)
> that started with r12-6030-g422f9eb7011b76c1.

Please file that bug separately and it might be related to PR 103961 which was just fixed too.
Comment 29 Siddhesh Poyarekar 2022-01-13 15:01:58 UTC
(In reply to Andrew Pinski from comment #28)
> (In reply to Martin Liška from comment #26)
> > that started with r12-6030-g422f9eb7011b76c1.
> 
> Please file that bug separately and it might be related to PR 103961 which
> was just fixed too.

It's kinda like PR 103961, but not the same.  I'll file a new bug; I've got a reduced reproducer too.
Comment 30 GCC Commits 2022-01-18 10:36:10 UTC
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:

https://gcc.gnu.org/g:76fe494230477a69f8fa8c8ca2d493acaf343eb1

commit r12-6666-g76fe494230477a69f8fa8c8ca2d493acaf343eb1
Author: Andrew Pinski <apinski@marvell.com>
Date:   Wed Nov 17 02:45:22 2021 +0000

    Fix tree-optimization/101941: IPA splitting out function with error attribute
    
    The Linux kernel started to fail compile when the jump threader was improved
    (r12-2591-g2e96b5f14e4025691). This failure was due to the IPA splitting code
    decided now to split off the basic block which contained two functions,
    one of those functions included the error attribute on them.  This patch fixes
    the problem by disallowing basic blocks from being split which contain functions
    that have either the error or warning attribute on them.
    
    The two new testcases are to make sure we still split the function for other
    places if we reject the one case.
    
    Committed as approved after Bootstrapped and tested on x86_64-linux-gnu with no regressions.
    
            PR tree-optimization/101941
    
    gcc/ChangeLog:
    
            * ipa-split.cc (visit_bb): Disallow function calls where
            the function has either error or warning attribute.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.c-torture/compile/pr101941-1.c: New test.
            * gcc.dg/tree-ssa/pr101941-1.c: New test.
Comment 31 Andrew Pinski 2022-01-18 10:36:36 UTC
Fixed.