the following code is miscompiled with gcc -O1 #define sysreg_read(regname) \ ({ \ unsigned long __sr_val; \ asm volatile( \ "mrs %0, " #regname "\n" \ : "=r" (__sr_val)); \ \ __sr_val; \ }) #define sysreg_write(regname, __sw_val) \ do { \ asm volatile( \ "msr " #regname ", %0\n" \ : \ : "r" (__sw_val)); \ } while (0) #define isb() \ do { \ asm volatile( \ "isb" \ : \ : \ : "memory"); \ } while (0) static unsigned long sctlr_read(void) { return sysreg_read(sctlr_el1); } static void sctlr_write(unsigned long val) { sysreg_write(sctlr_el1, val); } static void sctlr_rmw(void) { unsigned long val; val = sctlr_read(); val |= 1UL << 7; sctlr_write(val); } void sctlr_read_multiple(void) { sctlr_read(); sctlr_read(); sctlr_read(); sctlr_read(); } void sctlr_write_multiple(void) { sctlr_write(0); sctlr_write(0); sctlr_write(0); sctlr_write(0); sctlr_write(0); } void sctlr_rmw_multiple(void) { sctlr_rmw(); sctlr_rmw(); sctlr_rmw(); sctlr_rmw(); } void function(void) { sctlr_read_multiple(); sctlr_write_multiple(); sctlr_rmw_multiple(); isb(); } aarch64-linux-gnu-gcc -O1 compiles it to (note 'function' and 'sctlr_rmw_multiple'): sctlr_rmw: mrs x0, sctlr_el1 orr x0, x0, 128 msr sctlr_el1, x0 ret sctlr_read_multiple: mrs x0, sctlr_el1 mrs x0, sctlr_el1 mrs x0, sctlr_el1 mrs x0, sctlr_el1 ret sctlr_write_multiple: mov x0, 0 msr sctlr_el1, x0 msr sctlr_el1, x0 msr sctlr_el1, x0 msr sctlr_el1, x0 msr sctlr_el1, x0 ret sctlr_rmw_multiple: ret function: isb ret a similar issue in linux (but lager source file) got bisected to https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1b62cddcf091fb8cadf575246a7d3ff778650a6b commit 1b62cddcf091fb8cadf575246a7d3ff778650a6b Author: Jan Hubicka <jh@suse.cz> Date: Fri Nov 12 14:00:47 2021 +0100 Fix ipa-modref pure/const discovery PR ipa/103200 * ipa-modref.c (analyze_function, modref_propagate_in_scc): Do not mark pure/const function if there are side-effects. with -fdump-ipa-all $ grep found t.c.087i.modref Function found to be const: sctlr_rmw/2 Function found to be const: sctlr_read_multiple/3 Function found to be const: sctlr_write_multiple/4 Function found to be const: sctlr_rmw_multiple/5 even though t.c.086i.pure-const correctly identifies asm volatile as not const/pure.
Agreed, 'volatile' is documented as @item volatile The typical use of extended @code{asm} statements is to manipulate input values to produce output values. However, your @code{asm} statements may also produce side effects. If so, you may need to use the @code{volatile} qualifier to disable certain optimizations. @xref{Volatile}. so for the side-effects the asms should make the functions non-pure/const.
The odd thing is that during IPA propagation (modref_propagate_in_scc), the summary of sctlr_read is (gdb) p *cur_summary $2 = {loads = 0x7ffff665af00, stores = 0x7ffff665af10, kills = {<vec<modref_access_node, va_heap, vl_ptr>> = { m_vec = 0x0}, <No data fields>}, arg_flags = {<vec<unsigned short, va_heap, vl_ptr>> = { m_vec = 0x0}, <No data fields>}, retslot_flags = 0, static_chain_flags = 0, writes_errno = 0, side_effects = 0, nondeterministic = 0, calls_interposable = 0, load_accesses = 0, global_memory_read = 0, global_memory_written = 0, try_dse = 1} which is because we access the summary of cur->inlined_to (sctlr_rmw) here but that hasn't been processed. But the non-inlined node doesn't have a summary. So somehow we fail to merge the info from the inlined clones? But even using ->clone_of instead of ->inlined_to doesn't make a difference (no summary for that node either). Something is wrong. Honza?
Hmm, we ignore inlined edges during propagation!? for (cgraph_edge *callee_edge = cur->callees; callee_edge; callee_edge = callee_edge->next_callee) { int flags = flags_from_decl_or_type (callee_edge->callee->decl); modref_summary *callee_summary = NULL; modref_summary_lto *callee_summary_lto = NULL; struct cgraph_node *callee; if (!callee_edge->inline_failed || ((flags & (ECF_CONST | ECF_NOVOPS)) && !(flags & ECF_LOOPING_CONST_OR_PURE))) continue; but we also ignore calls during local analysis in IPA mode. Where are we supposed to factor in flags from inline clones? Is the IPA inline pass supposed to update summaries somehow?
Mine. The inliner pass is supposed to update the summary so it seems like a bug there. I will take a look tonight.
It is indeed missing logic to merge the side_effect and nondeterministic flag while updating after inlining. I am testing the following diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc index acfd7d80ff8..556816ab429 100644 --- a/gcc/ipa-modref.cc +++ b/gcc/ipa-modref.cc @@ -5281,6 +5281,29 @@ ipa_merge_modref_summary_after_inlining (cgraph_edge *edge) if (!ignore_stores) to_info_lto->stores->collapse (); } + /* Merge side effects and non-determinism. + PURE/CONST flags makes functions deterministic and if there is + no LOOPING_CONST_OR_PURE they also have no side effects. */ + if (!(flags & (ECF_CONST | ECF_NOVOPS | ECF_PURE)) + || (flags & ECF_LOOPING_CONST_OR_PURE)) + { + if (to_info) + { + if (!callee_info || callee_info->side_effects) + to_info->side_effects = true; + if ((!callee_info || callee_info->nondeterministic) + && !ignore_nondeterminism_p (edge->caller->decl, flags)) + to_info->nondeterministic = true; + } + if (to_info_lto) + { + if (!callee_info_lto || callee_info_lto->side_effects) + to_info_lto->side_effects = true; + if ((!callee_info_lto || callee_info_lto->nondeterministic) + && !ignore_nondeterminism_p (edge->caller->decl, flags)) + to_info_lto->nondeterministic = true; + } + } if (callee_info || callee_info_lto) { auto_vec <modref_parm_map, 32> parm_map;
Happy accident; I've been wanting to allow doing something like: static __always_inline __pure bool arch_static_branch(struct static_key * const key) { asm_volatile_goto("1:" ".byte " __stringify(BYTES_NOP5) "\n\t" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" ".long 1b - ., %l[l_yes] - . \n\t" _ASM_PTR "%c0 - . \n\t" ".popsecion \n\t" : : "i" (key) : : l_yes); return false; l_yes: return true; } Now, since asm-goto is implicitly volatile, and asm volatile (per this thread) destroys __pure due to side-effects, this doesn't actually work. But I'd like to still have the explicit pure attribute override this. If the programmer gets it wrong like this, he/she gets to keep the pieces. Specifically in this case, he result of the function really only depends on the @key argument. Any call to this will have the exact same outcome and merging multiple instances is *good*. Also see this thread on linux-toolchains: https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass.net/
(In reply to peterz from comment #6) > Happy accident; I've been wanting to allow doing something like: > > static __always_inline __pure bool arch_static_branch(struct static_key * > const key) > { > asm_volatile_goto("1:" > ".byte " __stringify(BYTES_NOP5) "\n\t" > ".pushsection __jump_table, \"aw\" \n\t" > _ASM_ALIGN "\n\t" > ".long 1b - ., %l[l_yes] - . \n\t" > _ASM_PTR "%c0 - . \n\t" > ".popsecion \n\t" > : : "i" (key) : : l_yes); > return false; > l_yes: > return true; > } > > > Now, since asm-goto is implicitly volatile, and asm volatile (per this > thread) destroys __pure due to side-effects, this doesn't actually work. But > I'd like to still have the explicit pure attribute override this. If the > programmer gets it wrong like this, he/she gets to keep the pieces. > > Specifically in this case, he result of the function really only depends on > the @key argument. Any call to this will have the exact same outcome and > merging multiple instances is *good*. > > Also see this thread on linux-toolchains: > > > https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass. > net/ Possible test case here: https://lore.kernel.org/all/YIb0vYQMEs9HRjPl@hirez.programming.kicks-ass.net/
(In reply to Jan Hubicka from comment #5) > It is indeed missing logic to merge the side_effect and nondeterministic > flag while updating after inlining. I am testing the following > I built a fedora gcc12 package with this patch, and then built a 4.17 kernel. Looking at the genet, it appears to be calling the enable_dma() routine now from the _open() function. It also boots on the seattle, which was also experiencing a gcc11/gcc12 related boot failure. So, at first glance it appears those two failures are corrected when this patch is applied.
> I built a fedora gcc12 package with this patch, and then built a 4.17 I mistyped that (again), it should be 5.17.
Fixed by r:aabb9a261ef060cf24fd626713f1d7d9df81aa57