Fixing ifcvt issue as exposed by BZ89430

JiangNing OS jiangning@os.amperecomputing.com
Thu May 9 06:06:00 GMT 2019



> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, May 8, 2019 3:35 PM
> To: JiangNing OS <jiangning@os.amperecomputing.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Biener <rguenther@suse.de>;
> pinskia@gcc.gnu.org
> Subject: Re: Fixing ifcvt issue as exposed by BZ89430
> 
> On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS
> <jiangning@os.amperecomputing.com> wrote:
> >
> > To solve BZ89430 the followings are needed,
> >
> > (1) The code below in noce_try_cmove_arith needs to be fixed.
> >
> >   /* ??? We could handle this if we knew that a load from A or B could
> >      not trap or fault.  This is also true if we've already loaded
> >      from the address along the path from ENTRY.  */
> >   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> >     return FALSE;
> >
> > Finding dominating memory access could help to decide whether the
> memory access following the conditional move is valid or not.
> > * If there is a dominating memory write to the same memory address in
> test_bb as the one from x=a, it is always safe.
> > * When the dominating memory access to the same memory address in
> test_bb is a read, for the load out of x=a, it is always safe. For the store out
> of x=a, if the memory is on stack, it is still safe.
> >
> > Besides, there is a bug around rearranging the then_bb and else_bb layout,
> which needs to be fixed.
> >
> > (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html
> is an overkill. If the write target following conditional move is a memory
> access, it exits shortly.
> >
> >   if (!set_b && MEM_P (orig_x))
> >     /* We want to avoid store speculation to avoid cases like
> >          if (pthread_mutex_trylock(mutex))
> >            ++global_variable;
> >        Rather than go to much effort here, we rely on the SSA optimizers,
> >        which do a good enough job these days.  */
> >     return FALSE;
> >
> > It looks a bit hard for compiler to know the program semantic is
> > related to mutex and avoid store speculation. Without removing this
> > overkill, the fix mentioned by (1) would not work. Any idea? An
> > alternative solution is to detect the following pattern
> > conservatively,
> 
> But it's important to not break this.  Note we do have --param allow-store-
> data-races which the user can use to override this.
> Note that accesses to the stack can obviously not cause store speculation if
> its address didn't escape.  But that's probably what is refered to by "rely on
> the SSA optimizers".

Yes! So I have sent out a patch with title "[PATCH] improve ifcvt optimization 
(PR rtl-optimization/89430)" to detect the access to stack two months ago.
The SSA Optimization called "tree-if-conv" in middle-end can't really cover
this case. The key part of my change is like,

-    /* We want to avoid store speculation to avoid cases like
-	 if (pthread_mutex_trylock(mutex))
-	   ++global_variable;
-       Rather than go to much effort here, we rely on the SSA optimizers,
-       which do a good enough job these days.  */
-    return FALSE;
+    {
+      /* We want to avoid store speculation to avoid cases like
+           if (pthread_mutex_trylock(mutex))
+             ++global_variable;  
+         Tree if conversion cannot handle this case well, and it intends to
+         help vectorization for loops only. */
+      if (!noce_mem_is_on_stack(insn_a, orig_x))
+        return FALSE;
 
+      /* For case like,
+           if (pthread_mutex_trylock(mutex))
+             ++local_variable;
+         If any stack variable address is taken, potentially this local
+         variable could be modified by other threads and introduce store
+         speculation. */
+      if (!cfun_no_stack_address_taken)
+        return FALSE;
+    }

Can you please take a look if you have time? Thank you!

> 
> >          if (a_function_call(...))
> >            ++local_variable;
> >
> > If the local_variable doesn't have address taken at all in current function, it
> mustn't be a pthread mutex lock semantic, and then the following patch
> would work to solve (1) and pass the last case as described in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430.
> >
> > Index: gcc/cse.c
> >
> ================================================================
> ===
> > --- gcc/cse.c   (revision 268256)
> > +++ gcc/cse.c   (working copy)
> > @@ -540,7 +540,6 @@
> >     already as part of an already processed extended basic block.  */
> > static sbitmap cse_visited_basic_blocks;
> >
> > -static bool fixed_base_plus_p (rtx x);  static int notreg_cost (rtx,
> > machine_mode, enum rtx_code, int);  static int preferable (int, int,
> > int, int);  static void new_basic_block (void); @@ -606,30 +605,7 @@
> >
> >  static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
> >
> >
> > -/* Nonzero if X has the form (PLUS frame-pointer integer).  */
> >
> > -static bool
> > -fixed_base_plus_p (rtx x)
> > -{
> > -  switch (GET_CODE (x))
> > -    {
> > -    case REG:
> > -      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> > -       return true;
> > -      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> > -       return true;
> > -      return false;
> > -
> > -    case PLUS:
> > -      if (!CONST_INT_P (XEXP (x, 1)))
> > -       return false;
> > -      return fixed_base_plus_p (XEXP (x, 0));
> > -
> > -    default:
> > -      return false;
> > -    }
> > -}
> > -
> >  /* Dump the expressions in the equivalence class indicated by CLASSP.
> >     This function is used only for debugging.  */  DEBUG_FUNCTION void
> > Index: gcc/ifcvt.c
> >
> ================================================================
> ===
> > --- gcc/ifcvt.c (revision 268256)
> > +++ gcc/ifcvt.c (working copy)
> > @@ -76,6 +76,9 @@
> >  /* Whether conditional execution changes were made.  */  static int
> > cond_exec_changed_p;
> >
> > +/* bitmap for stack frame pointer definition insns. */ static bitmap
> > +bba_sets_sfp;
> > +
> >  /* Forward references.  */
> >  static int count_bb_insns (const_basic_block);  static bool
> > cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int); @@
> > -99,6 +102,7 @@
> >                                edge, int);  static void
> > noce_emit_move_insn (rtx, rtx);  static rtx_insn *block_has_only_trap
> > (basic_block);
> > +static void collect_all_fp_insns (void);
> >
> >
> >  /* Count the number of non-jump active insns in BB.  */
> >
> > @@ -2029,6 +2033,110 @@
> >    return true;
> >  }
> >
> > +/* Return true if MEM x is on stack. a_insn contains x, if it exists.
> > +*/
> > +
> > +static bool
> > +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x) {
> > +  df_ref use;
> > +
> > +  gcc_assert (x);
> > +  gcc_assert (MEM_P (x));
> > +
> > +  /* Early exits if find base register is a stack register. */  rtx a
> > + = XEXP (x, 0);  if (fixed_base_plus_p(a))
> > +    return true;
> > +
> > +  if (!a_insn)
> > +    return false;
> > +
> > +  /* Check if x is on stack. Assume a mem expression using registers
> > +     related to stack register is always on stack. */
> > + FOR_EACH_INSN_USE (use, a_insn)
> > +    {
> > +      if (bitmap_bit_p (bba_sets_sfp, DF_REF_REGNO (use)))
> > +        return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Always return true, if there is a dominating write.
> > +
> > +   When there is a dominating read from memory on stack,
> > +   1) if x = a is a memory read, return true.
> > +   2) if x = a is a memory write, return true if the memory is on stack.
> > +      This is the guarantee the memory is *not* readonly. */
> > +
> > +static bool
> > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> > +                           const_rtx x, bool is_store) {
> > +  rtx_insn *insn;
> > +  rtx set;
> > +  bool find_load = false;
> > +
> > +  gcc_assert (MEM_P (x));
> > +
> > +  FOR_BB_INSNS (bb, insn)
> > +    {
> > +      set = single_set (insn);
> > +      if (!set)
> > +        continue;
> > +
> > +      /* Dominating store */
> > +      if (rtx_equal_p (x, SET_DEST (set)))
> > +        return true;
> > +
> > +      /* Dominating load */
> > +      if (rtx_equal_p (x, SET_SRC (set)))
> > +        find_load = true;
> > +    }
> > +
> > +  if (find_load)
> > +    {
> > +      if (is_store && noce_mem_is_on_stack (a_insn, x))
> > +        return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Return false if the memory a or b must be valid.
> > +   This function must be called before latent swap of a and b. */
> > +
> > +static bool
> > +noce_mem_maybe_invalid_p (struct noce_if_info *if_info) {
> > +  /* for memory load */
> > +  if (if_info->a && MEM_P (if_info->a))
> > +    {
> > +      return !noce_valid_for_dominating (if_info->test_bb,
> > +                                         if_info->insn_a,
> > +                                         if_info->a, false);
> > +    }
> > +
> > +  /* for memory store */
> > +  else if (if_info->b && MEM_P (if_info->b))
> > +    {
> > +      if (!if_info->else_bb)
> > +        return !noce_valid_for_dominating (if_info->test_bb,
> > +                                           if_info->insn_a,
> > +                                           if_info->b, true);
> > +      else
> > +        return !noce_valid_for_dominating (if_info->test_bb,
> > +                                           if_info->insn_b,
> > +                                           if_info->b, true);
> > +    }
> > +
> > +  /* ??? We could handle this if we knew that a load from A or B could
> > +     not trap or fault.  This is also true if we've already loaded
> > +     from the address along the path from ENTRY.  */
> > +  return (may_trap_or_fault_p (if_info->a)
> > +          || may_trap_or_fault_p (if_info->b)); }
> > +
> >  /* Try more complex cases involving conditional_move.  */
> >
> >  static int
> > @@ -2065,10 +2173,7 @@
> >        is_mem = 1;
> >      }
> >
> > -  /* ??? We could handle this if we knew that a load from A or B could
> > -     not trap or fault.  This is also true if we've already loaded
> > -     from the address along the path from ENTRY.  */
> > -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > +  else if (noce_mem_maybe_invalid_p (if_info))
> >      return FALSE;
> >
> >    /* if (test) x = a + b; else x = c - d; @@ -2234,7 +2339,7 @@
> >    /* If insn to set up A clobbers any registers B depends on, try to
> >       swap insn that sets up A with the one that sets up B.  If even
> >       that doesn't help, punt.  */
> > -  if (modified_in_a && !modified_in_b)
> > +  if (!modified_in_a && modified_in_b)
> >      {
> >        if (!noce_emit_bb (emit_b, else_bb, b_simple))
> >         goto end_seq_and_fail;
> > @@ -2242,7 +2347,7 @@
> >        if (!noce_emit_bb (emit_a, then_bb, a_simple))
> >         goto end_seq_and_fail;
> >      }
> > -  else if (!modified_in_a)
> > +  else if (!modified_in_b)
> >      {
> >        if (!noce_emit_bb (emit_a, then_bb, a_simple))
> >         goto end_seq_and_fail;
> > @@ -5347,6 +5452,30 @@
> >
> >    return FALSE;
> >  }
> > +
> > +static void
> > +collect_all_fp_insns (void)
> > +{
> > +  rtx_insn *a_insn;
> > +  bba_sets_sfp = BITMAP_ALLOC (&reg_obstack);
> > +  df_ref def;
> > +  basic_block bb;
> > +
> > +  FOR_ALL_BB_FN (bb, cfun)
> > +    FOR_BB_INSNS (bb, a_insn)
> > +      {
> > +        rtx sset_a = single_set (a_insn);
> > +        if (!sset_a)
> > +          continue;
> > +
> > +        if (fixed_base_plus_p (SET_SRC (sset_a)))
> > +          {
> > +            FOR_EACH_INSN_DEF (def, a_insn)
> > +             bitmap_set_bit (bba_sets_sfp, DF_REF_REGNO (def));
> > +         }
> > +      }
> > +}
> > +
> >
> >
> >  /* Main entry point for all if-conversion.  AFTER_COMBINE is true if
> >     we are after combine pass.  */
> > @@ -5381,6 +5510,8 @@
> >
> >    df_set_flags (DF_LR_RUN_DCE);
> >
> > +  collect_all_fp_insns ();
> > +
> >    /* Go through each of the basic blocks looking for things to convert.  If we
> >       have conditional execution, we make multiple passes to allow us to
> handle
> >       IF-THEN{-ELSE} blocks within other IF-THEN{-ELSE} blocks.  */ @@
> > -5413,6 +5544,8 @@
> >      }
> >    while (cond_exec_changed_p);
> >
> > +  BITMAP_FREE (bba_sets_sfp);
> > +
> >  #ifdef IFCVT_MULTIPLE_DUMPS
> >    if (dump_file)
> >      fprintf (dump_file, "\n\n========== no more changes\n");
> > Index: gcc/rtl.h
> >
> ================================================================
> ===
> > --- gcc/rtl.h   (revision 268256)
> > +++ gcc/rtl.h   (working copy)
> > @@ -3751,6 +3751,8 @@
> >  #define hard_frame_pointer_rtx (global_rtl[GR_HARD_FRAME_POINTER])
> >  #define arg_pointer_rtx                (global_rtl[GR_ARG_POINTER])
> >
> > +extern bool fixed_base_plus_p (rtx x);
> > +
> >  #ifndef GENERATOR_FILE
> >  /* Return the attributes of a MEM rtx.  */  static inline const
> > struct mem_attrs *
> > Index: gcc/rtlanal.c
> >
> ================================================================
> ===
> > --- gcc/rtlanal.c       (revision 268256)
> > +++ gcc/rtlanal.c       (working copy)
> > @@ -341,6 +341,30 @@
> >    return 0;
> >  }
> >
> > +/* Nonzero if X has the form (PLUS frame-pointer integer).  */
> > +
> > +bool
> > +fixed_base_plus_p (rtx x)
> > +{
> > +  switch (GET_CODE (x))
> > +    {
> > +    case REG:
> > +      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> > +        return true;
> > +      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> > +        return true;
> > +      return false;
> > +
> > +    case PLUS:
> > +      if (!CONST_INT_P (XEXP (x, 1)))
> > +        return false;
> > +      return fixed_base_plus_p (XEXP (x, 0));
> > +
> > +    default:
> > +      return false;
> > +    }
> > +}
> > +
> >  /* Compute an approximation for the offset between the register
> >     FROM and TO for the current function, as it was at the start
> >     of the routine.  */
> >
> > Thanks,
> > -Jiangning


More information about the Gcc-patches mailing list