Bug 105160 - [12 regression] ipa modref marks functions with asm volatile as const or pure
Summary: [12 regression] ipa modref marks functions with asm volatile as const or pure
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 12.0
: P1 normal
Target Milestone: 12.0
Assignee: Jan Hubicka
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2022-04-05 11:50 UTC by nsz
Modified: 2022-04-09 19:11 UTC (History)
13 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-04-05 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description nsz 2022-04-05 11:50:37 UTC
the following code is miscompiled with gcc -O1

#define sysreg_read(regname)		\
({					\
	unsigned long __sr_val;		\
	asm volatile(			\
	"mrs %0, " #regname "\n"	\
	: "=r" (__sr_val));		\
					\
	__sr_val;			\
})

#define sysreg_write(regname, __sw_val)	\
do {					\
	asm volatile(			\
	"msr " #regname ", %0\n"	\
	:				\
	: "r" (__sw_val));		\
} while (0)

#define isb()				\
do {					\
	asm volatile(			\
	"isb"				\
	:				\
	:				\
	: "memory");			\
} while (0)

static unsigned long sctlr_read(void)
{
	return sysreg_read(sctlr_el1);
}

static void sctlr_write(unsigned long val)
{
	sysreg_write(sctlr_el1, val);
}

static void sctlr_rmw(void)
{
	unsigned long val;

	val = sctlr_read();
	val |= 1UL << 7;
	sctlr_write(val);
}

void sctlr_read_multiple(void)
{
	sctlr_read();
	sctlr_read();
	sctlr_read();
	sctlr_read();
}

void sctlr_write_multiple(void)
{
	sctlr_write(0);
	sctlr_write(0);
	sctlr_write(0);
	sctlr_write(0);
	sctlr_write(0);
}

void sctlr_rmw_multiple(void)
{
	sctlr_rmw();
	sctlr_rmw();
	sctlr_rmw();
	sctlr_rmw();
}

void function(void)
{
	sctlr_read_multiple();
	sctlr_write_multiple();
	sctlr_rmw_multiple();

	isb();
}



aarch64-linux-gnu-gcc -O1 compiles it to
(note 'function' and 'sctlr_rmw_multiple'):


sctlr_rmw:
        mrs x0, sctlr_el1

        orr     x0, x0, 128
        msr sctlr_el1, x0

        ret
sctlr_read_multiple:
        mrs x0, sctlr_el1

        mrs x0, sctlr_el1

        mrs x0, sctlr_el1

        mrs x0, sctlr_el1

        ret
sctlr_write_multiple:
        mov     x0, 0
        msr sctlr_el1, x0

        msr sctlr_el1, x0

        msr sctlr_el1, x0

        msr sctlr_el1, x0

        msr sctlr_el1, x0

        ret
sctlr_rmw_multiple:
        ret
function:
        isb
        ret


a similar issue in linux (but lager source file) got bisected to

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1b62cddcf091fb8cadf575246a7d3ff778650a6b

commit 1b62cddcf091fb8cadf575246a7d3ff778650a6b
Author: Jan Hubicka <jh@suse.cz>
Date:   Fri Nov 12 14:00:47 2021 +0100

    Fix ipa-modref pure/const discovery
    
            PR ipa/103200
            * ipa-modref.c (analyze_function, modref_propagate_in_scc): Do
            not mark pure/const function if there are side-effects.


with -fdump-ipa-all

$ grep found t.c.087i.modref
Function found to be const: sctlr_rmw/2
Function found to be const: sctlr_read_multiple/3
Function found to be const: sctlr_write_multiple/4
Function found to be const: sctlr_rmw_multiple/5

even though t.c.086i.pure-const correctly identifies asm volatile
as not const/pure.
Comment 1 Richard Biener 2022-04-05 11:58:09 UTC
Agreed, 'volatile' is documented as

@item volatile
The typical use of extended @code{asm} statements is to manipulate input
values to produce output values. However, your @code{asm} statements may
also produce side effects. If so, you may need to use the @code{volatile}
qualifier to disable certain optimizations. @xref{Volatile}.

so for the side-effects the asms should make the functions non-pure/const.
Comment 2 Richard Biener 2022-04-05 12:19:07 UTC
The odd thing is that during IPA propagation (modref_propagate_in_scc), the
summary of sctlr_read is

(gdb) p *cur_summary
$2 = {loads = 0x7ffff665af00, stores = 0x7ffff665af10, 
  kills = {<vec<modref_access_node, va_heap, vl_ptr>> = {
      m_vec = 0x0}, <No data fields>}, 
  arg_flags = {<vec<unsigned short, va_heap, vl_ptr>> = {
      m_vec = 0x0}, <No data fields>}, retslot_flags = 0, 
  static_chain_flags = 0, writes_errno = 0, side_effects = 0, 
  nondeterministic = 0, calls_interposable = 0, load_accesses = 0, 
  global_memory_read = 0, global_memory_written = 0, try_dse = 1}

which is because we access the summary of cur->inlined_to (sctlr_rmw) here
but that hasn't been processed.  But the non-inlined node doesn't have a summary.

So somehow we fail to merge the info from the inlined clones?

But even using ->clone_of instead of ->inlined_to doesn't make a difference
(no summary for that node either).

Something is wrong.  Honza?
Comment 3 Richard Biener 2022-04-05 12:30:38 UTC
Hmm, we ignore inlined edges during propagation!?

          for (cgraph_edge *callee_edge = cur->callees; callee_edge;
               callee_edge = callee_edge->next_callee)
            {
              int flags = flags_from_decl_or_type (callee_edge->callee->decl);
              modref_summary *callee_summary = NULL;
              modref_summary_lto *callee_summary_lto = NULL;
              struct cgraph_node *callee;

              if (!callee_edge->inline_failed
                 || ((flags & (ECF_CONST | ECF_NOVOPS))
                     && !(flags & ECF_LOOPING_CONST_OR_PURE)))
                continue;

but we also ignore calls during local analysis in IPA mode.  Where are we supposed to factor in flags from inline clones?  Is the IPA inline pass
supposed to update summaries somehow?
Comment 4 Jan Hubicka 2022-04-05 14:54:13 UTC
Mine.
The inliner pass is supposed to update the summary so it seems like a bug there.
I will take a look tonight.
Comment 5 Jan Hubicka 2022-04-05 21:01:23 UTC
It is indeed missing logic to merge the side_effect and nondeterministic flag while updating after inlining.  I am testing the following

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index acfd7d80ff8..556816ab429 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -5281,6 +5281,29 @@ ipa_merge_modref_summary_after_inlining (cgraph_edge *edge)
       if (!ignore_stores)
        to_info_lto->stores->collapse ();
     }
+  /* Merge side effects and non-determinism.
+     PURE/CONST flags makes functions deterministic and if there is
+     no LOOPING_CONST_OR_PURE they also have no side effects.  */
+  if (!(flags & (ECF_CONST | ECF_NOVOPS | ECF_PURE))
+      || (flags & ECF_LOOPING_CONST_OR_PURE))
+    {
+      if (to_info)
+       {
+         if (!callee_info || callee_info->side_effects)
+           to_info->side_effects = true;
+         if ((!callee_info || callee_info->nondeterministic)
+             && !ignore_nondeterminism_p (edge->caller->decl, flags))
+           to_info->nondeterministic = true;
+       }
+      if (to_info_lto)
+       {
+         if (!callee_info_lto || callee_info_lto->side_effects)
+           to_info_lto->side_effects = true;
+         if ((!callee_info_lto || callee_info_lto->nondeterministic)
+             && !ignore_nondeterminism_p (edge->caller->decl, flags))
+           to_info_lto->nondeterministic = true;
+       }
+     }
   if (callee_info || callee_info_lto)
     {
       auto_vec <modref_parm_map, 32> parm_map;
Comment 6 peterz 2022-04-06 07:08:09 UTC
Happy accident; I've been wanting to allow doing something like:

static __always_inline __pure bool arch_static_branch(struct static_key * const key)
{
    asm_volatile_goto("1:"
                      ".byte " __stringify(BYTES_NOP5) "\n\t"
                      ".pushsection __jump_table, \"aw\" \n\t"
                      _ASM_ALIGN "\n\t"
                      ".long 1b - ., %l[l_yes] - . \n\t"
                      _ASM_PTR "%c0 - . \n\t"
                      ".popsecion \n\t"
                      : : "i" (key) : : l_yes);
    return false;
l_yes:
    return true;
}


Now, since asm-goto is implicitly volatile, and asm volatile (per this thread) destroys __pure due to side-effects, this doesn't actually work. But I'd like to still have the explicit pure attribute override this. If the programmer gets it wrong like this, he/she gets to keep the pieces.

Specifically in this case, he result of the function really only depends on the @key argument. Any call to this will have the exact same outcome and merging multiple instances is *good*.

Also see this thread on linux-toolchains:

  https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass.net/
Comment 7 peterz 2022-04-06 07:10:50 UTC
(In reply to peterz from comment #6)
> Happy accident; I've been wanting to allow doing something like:
> 
> static __always_inline __pure bool arch_static_branch(struct static_key *
> const key)
> {
>     asm_volatile_goto("1:"
>                       ".byte " __stringify(BYTES_NOP5) "\n\t"
>                       ".pushsection __jump_table, \"aw\" \n\t"
>                       _ASM_ALIGN "\n\t"
>                       ".long 1b - ., %l[l_yes] - . \n\t"
>                       _ASM_PTR "%c0 - . \n\t"
>                       ".popsecion \n\t"
>                       : : "i" (key) : : l_yes);
>     return false;
> l_yes:
>     return true;
> }
> 
> 
> Now, since asm-goto is implicitly volatile, and asm volatile (per this
> thread) destroys __pure due to side-effects, this doesn't actually work. But
> I'd like to still have the explicit pure attribute override this. If the
> programmer gets it wrong like this, he/she gets to keep the pieces.
> 
> Specifically in this case, he result of the function really only depends on
> the @key argument. Any call to this will have the exact same outcome and
> merging multiple instances is *good*.
> 
> Also see this thread on linux-toolchains:
> 
>  
> https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass.
> net/

Possible test case here:

  https://lore.kernel.org/all/YIb0vYQMEs9HRjPl@hirez.programming.kicks-ass.net/
Comment 8 jeremy.linton 2022-04-07 14:46:31 UTC
(In reply to Jan Hubicka from comment #5)
> It is indeed missing logic to merge the side_effect and nondeterministic
> flag while updating after inlining.  I am testing the following
> 

I built a fedora gcc12 package with this patch, and then built a 4.17 kernel. Looking at the genet, it appears to be calling the enable_dma() routine now from the _open() function. It also boots on the seattle, which was also experiencing a gcc11/gcc12 related boot failure.  

So, at first glance it appears those two failures are corrected when this patch is applied.
Comment 9 jeremy.linton 2022-04-07 14:48:07 UTC
> I built a fedora gcc12 package with this patch, and then built a 4.17

I mistyped that (again), it should be 5.17.
Comment 10 Jan Hubicka 2022-04-09 19:11:18 UTC
Fixed by r:aabb9a261ef060cf24fd626713f1d7d9df81aa57