Bug 105739 - [10 Regression] Miscompilation of Linux kernel update.c
Summary: [10 Regression] Miscompilation of Linux kernel update.c
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.3.1
: P3 normal
Target Milestone: 10.4
Assignee: Jan Hubicka
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-26 11:22 UTC by Jakub Jelinek
Modified: 2022-06-20 10:42 UTC (History)
5 users (show)

See Also:
Host:
Target: x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-05-26 00:00:00


Attachments
update.i.xz (263.18 KB, application/octet-stream)
2022-05-26 11:23 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2022-05-26 11:22:31 UTC
The attached testcase with -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -std=gnu11 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none   -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -fno-jump-tables -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -fstack-protector-strong -fomit-frame-pointer -fno-stack-clash-protection -fno-strict-overflow -fno-stack-check -fconserve-stack
options has a call to _printk replaced by __builtin_unreachable () and it isn't
obviously clear why.

One spot is in the rcu_tasks_trace_pertask function where in assembly one can see:
        .type   rcu_tasks_trace_pertask.cold, @function
rcu_tasks_trace_pertask.cold:
.L1662:
        movl    20(%rdi), %eax
        .text
        .size   rcu_tasks_trace_pertask, .-rcu_tasks_trace_pertask

The problem went away with r11-5188-g32934a4f45a7214 but it is unclear if
that was an actual fix or just made it latent, I don't see a compound literal on that line.
Comment 1 Jakub Jelinek 2022-05-26 11:23:25 UTC
Honza, could you please have a look?
Comment 2 Jakub Jelinek 2022-05-26 11:23:48 UTC
Created attachment 53036 [details]
update.i.xz
Comment 3 Jakub Jelinek 2022-05-26 11:28:32 UTC
The call is added at:
#0  gimple_set_code (g=<gimple_call 0x7fffea08a2a0>, code=GIMPLE_CALL) at ../../gcc/gimple.c:108
#1  0x0000000000bceae1 in gimple_alloc (code=GIMPLE_CALL, num_ops=7) at ../../gcc/gimple.c:140
#2  0x0000000000bd189a in gimple_copy (stmt=<gimple_call 0x7fffea09fe70>) at ../../gcc/gimple.c:1806
#3  0x000000000109ed79 in remap_gimple_stmt (stmt=<gimple_call 0x7fffea09fe70>, id=0x7fffffffd620) at ../../gcc/tree-inline.c:1796
#4  0x000000000109f39a in copy_bb (id=0x7fffffffd620, bb=<basic_block 0x7fffe797f548 (4)>, num=..., den=...) at ../../gcc/tree-inline.c:1950
#5  0x00000000010a223c in copy_cfg_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, 
    new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:2884
#6  0x00000000010a2d21 in copy_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, 
    new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:3126
#7  0x00000000010a6aa2 in expand_call_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, stmt=<gimple_call 0x7fffe70c4090>, id=0x7fffffffd620, to_purge=0x7fffffffd600)
    at ../../gcc/tree-inline.c:4867
#8  0x00000000010a76e7 in gimple_expand_calls_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, id=0x7fffffffd620, to_purge=0x7fffffffd600) at ../../gcc/tree-inline.c:5060
#9  0x00000000010a7da5 in optimize_inline_calls (fn=<function_decl 0x7fffe81d2100 rcu_tasks_trace_pertask>) at ../../gcc/tree-inline.c:5202
#10 0x0000000001cc06ec in inline_transform (node=<cgraph_node * 0x7fffe81d1438 "rcu_tasks_trace_pertask"/5350>) at ../../gcc/ipa-inline-transform.c:682

during inlining of trc_wait_for_one_reader into rcu_tasks_trace_pertask, when copying
_printk ("\x016%s(P%d/%d) IPI to task still in flight.\n", &__func__, _1, _8);
and later changed to __builtin_unreachable in:
#0  gimple_call_set_fndecl (gs=0x7fffea08a2a0, decl=<function_decl 0x7fffea2b6700 __builtin_unreachable>) at ../../gcc/gimple.h:3058
#1  0x00000000009f8bb2 in cgraph_edge::redirect_call_stmt_to_callee (
    this=<cgraph_edge* 0x7fffea0e5680 (<cgraph_node * 0x7fffe81d1438 "rcu_tasks_trace_pertask"/5350> -> <cgraph_node * 0x7fffea3c8b40 "__builtin_unreachable"/5848>)>)
    at ../../gcc/cgraph.c:1489
#2  0x00000000010a1f10 in redirect_all_calls (id=0x7fffffffd620, bb=<basic_block 0x7fffe70c6000 (13)>) at ../../gcc/tree-inline.c:2814
#3  0x00000000010a262b in copy_cfg_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, 
    new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:2950
#4  0x00000000010a2d21 in copy_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, 
    new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:3126
#5  0x00000000010a6aa2 in expand_call_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, stmt=<gimple_call 0x7fffe70c4090>, id=0x7fffffffd620, to_purge=0x7fffffffd600)
    at ../../gcc/tree-inline.c:4867
#6  0x00000000010a76e7 in gimple_expand_calls_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, id=0x7fffffffd620, to_purge=0x7fffffffd600) at ../../gcc/tree-inline.c:5060
#7  0x00000000010a7da5 in optimize_inline_calls (fn=<function_decl 0x7fffe81d2100 rcu_tasks_trace_pertask>) at ../../gcc/tree-inline.c:5202
Comment 4 Jan Hubicka 2022-05-26 13:51:26 UTC
mine. Looks like another case where ipa and local cprop gets out of sync...
Comment 5 Richard Biener 2022-05-27 09:48:14 UTC
GCC 9 branch is being closed
Comment 6 Jakub Jelinek 2022-06-02 10:11:08 UTC
Honza, any estimate how long this could take?  I'd prefer to wait with 10.4 for it  if it isn't going to take too long.
Comment 7 hubicka 2022-06-04 16:07:52 UTC
> Honza, any estimate how long this could take?  I'd prefer to wait with 10.4 for
> it  if it isn't going to take too long.
I am at a conference this week with talk at Wednesday.  I will try to
debug this during the event.

Honza
Comment 8 Jan Hubicka 2022-06-09 11:21:56 UTC
After inlning I see:
IPA function summary for rcu_tasks_trace_pertask/5350 inlinable                 
  global time:     13.535950                                                    
  self size:       11                                                           
  global size:     16                                                           
  min size:       11                                                            
  self stack:      0                                                            
  global stack:    0                                                            
  estimated growth:5                                                            
    size:8.000000, time:5.807250                                                
    size:3.000000, time:2.000000,  executed if:(not inlined)                    
    size:2.000000, time:2.000000,  nonconst if:(op0 changed)                    
  calls:                                                                        
    rcu_tasks_trace_pertask.part.0/5788 inlined                                 
      freq:0.62                                                                 
      Stack frame offset 0, callee self size 0                                  
      trc_wait_for_one_reader/5799 inlined                                      
        freq:0.62                                                               
        Stack frame offset 0, callee self size 0                                
        __builtin_unreachable/5800 unreachable                                  
          freq:0.00 loop depth: 0 size: 0 time:  0 predicate: (false)           
           op0 is compile time invariant                                        
           op1 is compile time invariant                                        
        trc_wait_for_one_reader.part.0/5784 --param max-inline-insns-auto limit reached
          freq:0.31 loop depth: 0 size: 3 time: 12 callee size:100 stack: 0     
    __builtin_expect/5421 function body not available                           
      freq:1.00 loop depth: 0 size: 0 time:  0                                  
       op1 is compile time invariant                                            

So it seems that we determine call in trc_wait_for_one_reader unreachable.
It originally calls printk:
IPA function summary for trc_wait_for_one_reader/5348 inlinable                 
  global time:     13.500000                                                    
  self size:       20                                                           
  global size:     20                                                           
  min size:       1                                                             
  self stack:      0                                                            
  global stack:    0                                                            
    size:1.000000, time:1.000000                                                
    size:3.000000, time:2.000000,  executed if:(not inlined)                    
    size:0.500000, time:0.500000,  executed if:(not inlined),  nonconst if:(op0[ref offset: 8672] changed) && (not inlined)
    size:2.500000, time:2.500000,  nonconst if:(op0[ref offset: 8672] changed)  
    size:1.500000, time:0.250000,  executed if:(op0[ref offset: 8672] != -1) && (not inlined)
    size:3.500000, time:1.250000,  executed if:(op0[ref offset: 8672] != -1)    
  calls:                                                                        
    trc_wait_for_one_reader.part.0/5784 function not considered for inlining    
      freq:0.50 loop depth: 0 size: 3 time: 12 callee size:55 stack: 0 predicate: (op0[ref offset: 8672] == -1)
    _printk/5452 function body not available                                    
      freq:0.00 loop depth: 0 size: 5 time: 14 predicate: (op0[ref offset: 8672] != -1)
       op0 is compile time invariant                                            
       
So we somehow figure out (op0[ref offset: 8672] != -1) here:
Enqueueing calls in rcu_tasks_trace_pertask.part.0/5788.                        
   Estimating body: trc_wait_for_one_reader/5348                                
   Known to be false: not inlined, op0[ref offset: 8672] != -1, op0[ref offset: 8672] changed
   size:4 time:7.000000 nonspec time:12.000000                                  

This seems to be based on:
                                                                            
Jump functions:                                                                 
  Jump functions of caller  rcu_tasks_trace_pertask.part.0/5788:                
    callsite  rcu_tasks_trace_pertask.part.0/5788 -> trc_wait_for_one_reader/5348 :
       param 0: PASS THROUGH: 0, op nop_expr                                    
         Aggregate passed by reference:                                         
           offset: 8672, type: int, CONST: -1                                   
         value: 0x0, mask: 0xffffffffffffffff                                   
         Unknown VR                                                             
       param 1: PASS THROUGH: 1, op nop_expr                                    
         value: 0x0, mask: 0xffffffffffffffff                                   
         Unknown VR                                                             

Which seems correct:
rcu_tasks_trace_pertask.part.0 (struct task_struct * t, struct list_head * hop) 
{                                                                               
  struct task_struct * D.58527;                                                 
  u64 pfo_val__;                                                                
  _Bool _1;                                                                     
  long int _2;                                                                  
  long int _3;                                                                  
  struct task_struct * _14;                                                     
                                                                                
  <bb 4> [local count: 1073741824]:                                             
                                                                                
  <bb 2> [local count: 1073741824]:                                             
  __asm__ __volatile__("" :  :  : "memory");                                    
  MEM[(volatile u8 *)t_1(D) + 1089B] ={v} 0;                                    
  __asm__ __volatile__("lock; addl $0,-4(%%rsp)" :  :  : "cc", "memory");       
  t_1(D)->trc_ipi_to_cpu = -1;                                                  
  trc_wait_for_one_reader (t_1(D), hop_2(D));                                   
                                                                                
  <bb 3> [local count: 1073741824]:                                             
  return;                                                                       
                                                                                
}                                                                               

so there is really store to trc_ipi_to_cpu. The code uses it as:

  _60 ={v} t_59(D)->trc_ipi_to_cpu;                                             
  __asm__ __volatile__("" :  :  : "memory");                                    
  if (_60 != -1)                                                                
    goto <bb 3>; [50.00%]                                                       
  else                                                                          
    goto <bb 6>; [50.00%]                                                       

So bug may be that we ignore volatile flag when determining IPA predicates?

problem goes away with -fno-partial-inlining.
Comment 9 Jan Hubicka 2022-06-10 10:47:11 UTC
Indeed volatile checks seems to be missing across ipa-prop code. Here is smaller testcase:

__attribute__((noinline))
static int
test2(int a)
{
        if (__builtin_constant_p (a))
                __builtin_abort ();
        return a;
}
/*__attribute__((noinline))*/
static int
test(int *a)
{
        int val = *(volatile int *)a;
        if (__builtin_constant_p (val))
                __builtin_abort ();
        if (val)
          return test2(val);
        return 0;
}
int a;
int
main()
{
        a = 0;
        return test (&a);
}

is optimized to
main:
.LFB2:
        .cfi_startproc
        movl    $0, a(%rip)
        movl    a(%rip), %eax
        xorl    %eax, %eax
        ret
        .cfi_endproc

which I don't think is correct. The volatile load can be non-0 and thus we can return non-0.
It does not trigger unreachable as I hoped for since vrp and fab passes seems to jump to same conclusion as ipa-cp.
Comment 10 Jan Hubicka 2022-06-10 10:56:24 UTC
I am testing
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index afd9222b5a2..c037668e7d8 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -1112,6 +1112,10 @@ ipa_load_from_parm_agg (struct ipa_func_body_info *fbi,
   if (!base)
     return false;
 
+  /* We can not propagate across volatile loads.  */
+  if (TREE_THIS_VOLATILE (op))
+    return false;
+
   if (DECL_P (base))
     {
       int index = ipa_get_param_decl_index_1 (descriptors, base);
Comment 11 GCC Commits 2022-06-14 12:07:39 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:8f6c317b3a16350698f3c9e0accb43a9b4acb4ae

commit r13-1089-g8f6c317b3a16350698f3c9e0accb43a9b4acb4ae
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Jun 14 14:05:53 2022 +0200

    Fix ipa-cp wrt volatile loads
    
    Check for volatile flag to ipa_load_from_parm_agg.
    
    gcc/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/105739
            * ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads.
    
    gcc/testsuite/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            * gcc.dg/ipa/pr105739.c: New test.
Comment 12 Jakub Jelinek 2022-06-14 12:23:59 UTC
Thanks, I have verified that on the #c0 testcase on 10 branch it makes both __builtin_unreachable calls go away.
Comment 13 GCC Commits 2022-06-19 10:08:53 UTC
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:bf4ba940673b80961c5979078f9d37a7bef2f5ff

commit r12-8493-gbf4ba940673b80961c5979078f9d37a7bef2f5ff
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Jun 14 14:05:53 2022 +0200

    Fix ipa-cp wrt volatile loads
    
    Check for volatile flag to ipa_load_from_parm_agg.
    
    gcc/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/105739
            * ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads.
    
    gcc/testsuite/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            * gcc.dg/ipa/pr105739.c: New test.
    
    (cherry picked from commit 8f6c317b3a16350698f3c9e0accb43a9b4acb4ae)
Comment 14 GCC Commits 2022-06-20 06:37:43 UTC
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:49aa637488053223bc04e59aac411d4a92ebcf7b

commit r11-10080-g49aa637488053223bc04e59aac411d4a92ebcf7b
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Jun 14 14:05:53 2022 +0200

    Fix ipa-cp wrt volatile loads
    
    Check for volatile flag to ipa_load_from_parm_agg.
    
    gcc/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/105739
            * ipa-prop.c (ipa_load_from_parm_agg): Punt on volatile loads.
    
    gcc/testsuite/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            * gcc.dg/ipa/pr105739.c: New test.
    
    (cherry picked from commit 8f6c317b3a16350698f3c9e0accb43a9b4acb4ae)
Comment 15 GCC Commits 2022-06-20 06:38:12 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:62148f13c6427edd256072b3196a01b3d5ed2805

commit r10-10856-g62148f13c6427edd256072b3196a01b3d5ed2805
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Jun 14 14:05:53 2022 +0200

    Fix ipa-cp wrt volatile loads
    
    Check for volatile flag to ipa_load_from_parm_agg.
    
    gcc/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/105739
            * ipa-prop.c (ipa_load_from_parm_agg): Punt on volatile loads.
    
    gcc/testsuite/ChangeLog:
    
    2022-06-10  Jan Hubicka  <hubicka@ucw.cz>
    
            * gcc.dg/ipa/pr105739.c: New test.
    
    (cherry picked from commit 8f6c317b3a16350698f3c9e0accb43a9b4acb4ae)
Comment 16 Jakub Jelinek 2022-06-20 09:56:23 UTC
Fixed (Honza, hope you don't mind the backports I've done, did that so that it is on time for 10.4).
Comment 17 hubicka 2022-06-20 10:42:15 UTC
> Fixed (Honza, hope you don't mind the backports I've done, did that so that it
> is on time for 10.4).
Thanks. I don't mind: I was planning to do them this week anyway and
also extra VOLATILE check shoud not be very risky.