The attached testcase with -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -std=gnu11 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -fno-jump-tables -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -fstack-protector-strong -fomit-frame-pointer -fno-stack-clash-protection -fno-strict-overflow -fno-stack-check -fconserve-stack options has a call to _printk replaced by __builtin_unreachable () and it isn't obviously clear why. One spot is in the rcu_tasks_trace_pertask function where in assembly one can see: .type rcu_tasks_trace_pertask.cold, @function rcu_tasks_trace_pertask.cold: .L1662: movl 20(%rdi), %eax .text .size rcu_tasks_trace_pertask, .-rcu_tasks_trace_pertask The problem went away with r11-5188-g32934a4f45a7214 but it is unclear if that was an actual fix or just made it latent, I don't see a compound literal on that line.
Honza, could you please have a look?
Created attachment 53036 [details] update.i.xz
The call is added at: #0 gimple_set_code (g=<gimple_call 0x7fffea08a2a0>, code=GIMPLE_CALL) at ../../gcc/gimple.c:108 #1 0x0000000000bceae1 in gimple_alloc (code=GIMPLE_CALL, num_ops=7) at ../../gcc/gimple.c:140 #2 0x0000000000bd189a in gimple_copy (stmt=<gimple_call 0x7fffea09fe70>) at ../../gcc/gimple.c:1806 #3 0x000000000109ed79 in remap_gimple_stmt (stmt=<gimple_call 0x7fffea09fe70>, id=0x7fffffffd620) at ../../gcc/tree-inline.c:1796 #4 0x000000000109f39a in copy_bb (id=0x7fffffffd620, bb=<basic_block 0x7fffe797f548 (4)>, num=..., den=...) at ../../gcc/tree-inline.c:1950 #5 0x00000000010a223c in copy_cfg_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:2884 #6 0x00000000010a2d21 in copy_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:3126 #7 0x00000000010a6aa2 in expand_call_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, stmt=<gimple_call 0x7fffe70c4090>, id=0x7fffffffd620, to_purge=0x7fffffffd600) at ../../gcc/tree-inline.c:4867 #8 0x00000000010a76e7 in gimple_expand_calls_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, id=0x7fffffffd620, to_purge=0x7fffffffd600) at ../../gcc/tree-inline.c:5060 #9 0x00000000010a7da5 in optimize_inline_calls (fn=<function_decl 0x7fffe81d2100 rcu_tasks_trace_pertask>) at ../../gcc/tree-inline.c:5202 #10 0x0000000001cc06ec in inline_transform (node=<cgraph_node * 0x7fffe81d1438 "rcu_tasks_trace_pertask"/5350>) at ../../gcc/ipa-inline-transform.c:682 during inlining of trc_wait_for_one_reader into rcu_tasks_trace_pertask, when copying _printk ("\x016%s(P%d/%d) IPI to task still in flight.\n", &__func__, _1, _8); and later changed to __builtin_unreachable in: #0 gimple_call_set_fndecl (gs=0x7fffea08a2a0, decl=<function_decl 0x7fffea2b6700 __builtin_unreachable>) at ../../gcc/gimple.h:3058 #1 0x00000000009f8bb2 in cgraph_edge::redirect_call_stmt_to_callee ( this=<cgraph_edge* 0x7fffea0e5680 (<cgraph_node * 0x7fffe81d1438 "rcu_tasks_trace_pertask"/5350> -> <cgraph_node * 0x7fffea3c8b40 "__builtin_unreachable"/5848>)>) at ../../gcc/cgraph.c:1489 #2 0x00000000010a1f10 in redirect_all_calls (id=0x7fffffffd620, bb=<basic_block 0x7fffe70c6000 (13)>) at ../../gcc/tree-inline.c:2814 #3 0x00000000010a262b in copy_cfg_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:2950 #4 0x00000000010a2d21 in copy_body (id=0x7fffffffd620, entry_block_map=<basic_block 0x7fffe70b1dd0 (8)>, exit_block_map=<basic_block 0x7fffe70b1ea0 (10)>, new_entry=<basic_block 0x0>) at ../../gcc/tree-inline.c:3126 #5 0x00000000010a6aa2 in expand_call_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, stmt=<gimple_call 0x7fffe70c4090>, id=0x7fffffffd620, to_purge=0x7fffffffd600) at ../../gcc/tree-inline.c:4867 #6 0x00000000010a76e7 in gimple_expand_calls_inline (bb=<basic_block 0x7fffe70b1dd0 (8)>, id=0x7fffffffd620, to_purge=0x7fffffffd600) at ../../gcc/tree-inline.c:5060 #7 0x00000000010a7da5 in optimize_inline_calls (fn=<function_decl 0x7fffe81d2100 rcu_tasks_trace_pertask>) at ../../gcc/tree-inline.c:5202
mine. Looks like another case where ipa and local cprop gets out of sync...
GCC 9 branch is being closed
Honza, any estimate how long this could take? I'd prefer to wait with 10.4 for it if it isn't going to take too long.
> Honza, any estimate how long this could take? I'd prefer to wait with 10.4 for > it if it isn't going to take too long. I am at a conference this week with talk at Wednesday. I will try to debug this during the event. Honza
After inlning I see: IPA function summary for rcu_tasks_trace_pertask/5350 inlinable global time: 13.535950 self size: 11 global size: 16 min size: 11 self stack: 0 global stack: 0 estimated growth:5 size:8.000000, time:5.807250 size:3.000000, time:2.000000, executed if:(not inlined) size:2.000000, time:2.000000, nonconst if:(op0 changed) calls: rcu_tasks_trace_pertask.part.0/5788 inlined freq:0.62 Stack frame offset 0, callee self size 0 trc_wait_for_one_reader/5799 inlined freq:0.62 Stack frame offset 0, callee self size 0 __builtin_unreachable/5800 unreachable freq:0.00 loop depth: 0 size: 0 time: 0 predicate: (false) op0 is compile time invariant op1 is compile time invariant trc_wait_for_one_reader.part.0/5784 --param max-inline-insns-auto limit reached freq:0.31 loop depth: 0 size: 3 time: 12 callee size:100 stack: 0 __builtin_expect/5421 function body not available freq:1.00 loop depth: 0 size: 0 time: 0 op1 is compile time invariant So it seems that we determine call in trc_wait_for_one_reader unreachable. It originally calls printk: IPA function summary for trc_wait_for_one_reader/5348 inlinable global time: 13.500000 self size: 20 global size: 20 min size: 1 self stack: 0 global stack: 0 size:1.000000, time:1.000000 size:3.000000, time:2.000000, executed if:(not inlined) size:0.500000, time:0.500000, executed if:(not inlined), nonconst if:(op0[ref offset: 8672] changed) && (not inlined) size:2.500000, time:2.500000, nonconst if:(op0[ref offset: 8672] changed) size:1.500000, time:0.250000, executed if:(op0[ref offset: 8672] != -1) && (not inlined) size:3.500000, time:1.250000, executed if:(op0[ref offset: 8672] != -1) calls: trc_wait_for_one_reader.part.0/5784 function not considered for inlining freq:0.50 loop depth: 0 size: 3 time: 12 callee size:55 stack: 0 predicate: (op0[ref offset: 8672] == -1) _printk/5452 function body not available freq:0.00 loop depth: 0 size: 5 time: 14 predicate: (op0[ref offset: 8672] != -1) op0 is compile time invariant So we somehow figure out (op0[ref offset: 8672] != -1) here: Enqueueing calls in rcu_tasks_trace_pertask.part.0/5788. Estimating body: trc_wait_for_one_reader/5348 Known to be false: not inlined, op0[ref offset: 8672] != -1, op0[ref offset: 8672] changed size:4 time:7.000000 nonspec time:12.000000 This seems to be based on: Jump functions: Jump functions of caller rcu_tasks_trace_pertask.part.0/5788: callsite rcu_tasks_trace_pertask.part.0/5788 -> trc_wait_for_one_reader/5348 : param 0: PASS THROUGH: 0, op nop_expr Aggregate passed by reference: offset: 8672, type: int, CONST: -1 value: 0x0, mask: 0xffffffffffffffff Unknown VR param 1: PASS THROUGH: 1, op nop_expr value: 0x0, mask: 0xffffffffffffffff Unknown VR Which seems correct: rcu_tasks_trace_pertask.part.0 (struct task_struct * t, struct list_head * hop) { struct task_struct * D.58527; u64 pfo_val__; _Bool _1; long int _2; long int _3; struct task_struct * _14; <bb 4> [local count: 1073741824]: <bb 2> [local count: 1073741824]: __asm__ __volatile__("" : : : "memory"); MEM[(volatile u8 *)t_1(D) + 1089B] ={v} 0; __asm__ __volatile__("lock; addl $0,-4(%%rsp)" : : : "cc", "memory"); t_1(D)->trc_ipi_to_cpu = -1; trc_wait_for_one_reader (t_1(D), hop_2(D)); <bb 3> [local count: 1073741824]: return; } so there is really store to trc_ipi_to_cpu. The code uses it as: _60 ={v} t_59(D)->trc_ipi_to_cpu; __asm__ __volatile__("" : : : "memory"); if (_60 != -1) goto <bb 3>; [50.00%] else goto <bb 6>; [50.00%] So bug may be that we ignore volatile flag when determining IPA predicates? problem goes away with -fno-partial-inlining.
Indeed volatile checks seems to be missing across ipa-prop code. Here is smaller testcase: __attribute__((noinline)) static int test2(int a) { if (__builtin_constant_p (a)) __builtin_abort (); return a; } /*__attribute__((noinline))*/ static int test(int *a) { int val = *(volatile int *)a; if (__builtin_constant_p (val)) __builtin_abort (); if (val) return test2(val); return 0; } int a; int main() { a = 0; return test (&a); } is optimized to main: .LFB2: .cfi_startproc movl $0, a(%rip) movl a(%rip), %eax xorl %eax, %eax ret .cfi_endproc which I don't think is correct. The volatile load can be non-0 and thus we can return non-0. It does not trigger unreachable as I hoped for since vrp and fab passes seems to jump to same conclusion as ipa-cp.
I am testing diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index afd9222b5a2..c037668e7d8 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -1112,6 +1112,10 @@ ipa_load_from_parm_agg (struct ipa_func_body_info *fbi, if (!base) return false; + /* We can not propagate across volatile loads. */ + if (TREE_THIS_VOLATILE (op)) + return false; + if (DECL_P (base)) { int index = ipa_get_param_decl_index_1 (descriptors, base);
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:8f6c317b3a16350698f3c9e0accb43a9b4acb4ae commit r13-1089-g8f6c317b3a16350698f3c9e0accb43a9b4acb4ae Author: Jan Hubicka <jh@suse.cz> Date: Tue Jun 14 14:05:53 2022 +0200 Fix ipa-cp wrt volatile loads Check for volatile flag to ipa_load_from_parm_agg. gcc/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> PR ipa/105739 * ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads. gcc/testsuite/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/ipa/pr105739.c: New test.
Thanks, I have verified that on the #c0 testcase on 10 branch it makes both __builtin_unreachable calls go away.
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:bf4ba940673b80961c5979078f9d37a7bef2f5ff commit r12-8493-gbf4ba940673b80961c5979078f9d37a7bef2f5ff Author: Jan Hubicka <jh@suse.cz> Date: Tue Jun 14 14:05:53 2022 +0200 Fix ipa-cp wrt volatile loads Check for volatile flag to ipa_load_from_parm_agg. gcc/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> PR ipa/105739 * ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads. gcc/testsuite/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/ipa/pr105739.c: New test. (cherry picked from commit 8f6c317b3a16350698f3c9e0accb43a9b4acb4ae)
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:49aa637488053223bc04e59aac411d4a92ebcf7b commit r11-10080-g49aa637488053223bc04e59aac411d4a92ebcf7b Author: Jan Hubicka <jh@suse.cz> Date: Tue Jun 14 14:05:53 2022 +0200 Fix ipa-cp wrt volatile loads Check for volatile flag to ipa_load_from_parm_agg. gcc/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> PR ipa/105739 * ipa-prop.c (ipa_load_from_parm_agg): Punt on volatile loads. gcc/testsuite/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/ipa/pr105739.c: New test. (cherry picked from commit 8f6c317b3a16350698f3c9e0accb43a9b4acb4ae)
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:62148f13c6427edd256072b3196a01b3d5ed2805 commit r10-10856-g62148f13c6427edd256072b3196a01b3d5ed2805 Author: Jan Hubicka <jh@suse.cz> Date: Tue Jun 14 14:05:53 2022 +0200 Fix ipa-cp wrt volatile loads Check for volatile flag to ipa_load_from_parm_agg. gcc/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> PR ipa/105739 * ipa-prop.c (ipa_load_from_parm_agg): Punt on volatile loads. gcc/testsuite/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/ipa/pr105739.c: New test. (cherry picked from commit 8f6c317b3a16350698f3c9e0accb43a9b4acb4ae)
Fixed (Honza, hope you don't mind the backports I've done, did that so that it is on time for 10.4).
> Fixed (Honza, hope you don't mind the backports I've done, did that so that it > is on time for 10.4). Thanks. I don't mind: I was planning to do them this week anyway and also extra VOLATILE check shoud not be very risky.