Fixing ifcvt issue as exposed by BZ89430
JiangNing OS
jiangning@os.amperecomputing.com
Thu May 9 06:06:00 GMT 2019
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, May 8, 2019 3:35 PM
> To: JiangNing OS <jiangning@os.amperecomputing.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Biener <rguenther@suse.de>;
> pinskia@gcc.gnu.org
> Subject: Re: Fixing ifcvt issue as exposed by BZ89430
>
> On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS
> <jiangning@os.amperecomputing.com> wrote:
> >
> > To solve BZ89430 the followings are needed,
> >
> > (1) The code below in noce_try_cmove_arith needs to be fixed.
> >
> > /* ??? We could handle this if we knew that a load from A or B could
> > not trap or fault. This is also true if we've already loaded
> > from the address along the path from ENTRY. */
> > else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > return FALSE;
> >
> > Finding dominating memory access could help to decide whether the
> memory access following the conditional move is valid or not.
> > * If there is a dominating memory write to the same memory address in
> test_bb as the one from x=a, it is always safe.
> > * When the dominating memory access to the same memory address in
> test_bb is a read, for the load out of x=a, it is always safe. For the store out
> of x=a, if the memory is on stack, it is still safe.
> >
> > Besides, there is a bug around rearranging the then_bb and else_bb layout,
> which needs to be fixed.
> >
> > (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html
> is an overkill. If the write target following conditional move is a memory
> access, it exits shortly.
> >
> > if (!set_b && MEM_P (orig_x))
> > /* We want to avoid store speculation to avoid cases like
> > if (pthread_mutex_trylock(mutex))
> > ++global_variable;
> > Rather than go to much effort here, we rely on the SSA optimizers,
> > which do a good enough job these days. */
> > return FALSE;
> >
> > It looks a bit hard for compiler to know the program semantic is
> > related to mutex and avoid store speculation. Without removing this
> > overkill, the fix mentioned by (1) would not work. Any idea? An
> > alternative solution is to detect the following pattern
> > conservatively,
>
> But it's important to not break this. Note we do have --param allow-store-
> data-races which the user can use to override this.
> Note that accesses to the stack can obviously not cause store speculation if
> its address didn't escape. But that's probably what is refered to by "rely on
> the SSA optimizers".
Yes! So I have sent out a patch with title "[PATCH] improve ifcvt optimization
(PR rtl-optimization/89430)" to detect the access to stack two months ago.
The SSA Optimization called "tree-if-conv" in middle-end can't really cover
this case. The key part of my change is like,
- /* We want to avoid store speculation to avoid cases like
- if (pthread_mutex_trylock(mutex))
- ++global_variable;
- Rather than go to much effort here, we rely on the SSA optimizers,
- which do a good enough job these days. */
- return FALSE;
+ {
+ /* We want to avoid store speculation to avoid cases like
+ if (pthread_mutex_trylock(mutex))
+ ++global_variable;
+ Tree if conversion cannot handle this case well, and it intends to
+ help vectorization for loops only. */
+ if (!noce_mem_is_on_stack(insn_a, orig_x))
+ return FALSE;
+ /* For case like,
+ if (pthread_mutex_trylock(mutex))
+ ++local_variable;
+ If any stack variable address is taken, potentially this local
+ variable could be modified by other threads and introduce store
+ speculation. */
+ if (!cfun_no_stack_address_taken)
+ return FALSE;
+ }
Can you please take a look if you have time? Thank you!
>
> > if (a_function_call(...))
> > ++local_variable;
> >
> > If the local_variable doesn't have address taken at all in current function, it
> mustn't be a pthread mutex lock semantic, and then the following patch
> would work to solve (1) and pass the last case as described in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430.
> >
> > Index: gcc/cse.c
> >
> ================================================================
> ===
> > --- gcc/cse.c (revision 268256)
> > +++ gcc/cse.c (working copy)
> > @@ -540,7 +540,6 @@
> > already as part of an already processed extended basic block. */
> > static sbitmap cse_visited_basic_blocks;
> >
> > -static bool fixed_base_plus_p (rtx x); static int notreg_cost (rtx,
> > machine_mode, enum rtx_code, int); static int preferable (int, int,
> > int, int); static void new_basic_block (void); @@ -606,30 +605,7 @@
> >
> > static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
> >
> >
> > -/* Nonzero if X has the form (PLUS frame-pointer integer). */
> >
> > -static bool
> > -fixed_base_plus_p (rtx x)
> > -{
> > - switch (GET_CODE (x))
> > - {
> > - case REG:
> > - if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> > - return true;
> > - if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> > - return true;
> > - return false;
> > -
> > - case PLUS:
> > - if (!CONST_INT_P (XEXP (x, 1)))
> > - return false;
> > - return fixed_base_plus_p (XEXP (x, 0));
> > -
> > - default:
> > - return false;
> > - }
> > -}
> > -
> > /* Dump the expressions in the equivalence class indicated by CLASSP.
> > This function is used only for debugging. */ DEBUG_FUNCTION void
> > Index: gcc/ifcvt.c
> >
> ================================================================
> ===
> > --- gcc/ifcvt.c (revision 268256)
> > +++ gcc/ifcvt.c (working copy)
> > @@ -76,6 +76,9 @@
> > /* Whether conditional execution changes were made. */ static int
> > cond_exec_changed_p;
> >
> > +/* bitmap for stack frame pointer definition insns. */ static bitmap
> > +bba_sets_sfp;
> > +
> > /* Forward references. */
> > static int count_bb_insns (const_basic_block); static bool
> > cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int); @@
> > -99,6 +102,7 @@
> > edge, int); static void
> > noce_emit_move_insn (rtx, rtx); static rtx_insn *block_has_only_trap
> > (basic_block);
> > +static void collect_all_fp_insns (void);
> >
> >
> > /* Count the number of non-jump active insns in BB. */
> >
> > @@ -2029,6 +2033,110 @@
> > return true;
> > }
> >
> > +/* Return true if MEM x is on stack. a_insn contains x, if it exists.
> > +*/
> > +
> > +static bool
> > +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x) {
> > + df_ref use;
> > +
> > + gcc_assert (x);
> > + gcc_assert (MEM_P (x));
> > +
> > + /* Early exits if find base register is a stack register. */ rtx a
> > + = XEXP (x, 0); if (fixed_base_plus_p(a))
> > + return true;
> > +
> > + if (!a_insn)
> > + return false;
> > +
> > + /* Check if x is on stack. Assume a mem expression using registers
> > + related to stack register is always on stack. */
> > + FOR_EACH_INSN_USE (use, a_insn)
> > + {
> > + if (bitmap_bit_p (bba_sets_sfp, DF_REF_REGNO (use)))
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +/* Always return true, if there is a dominating write.
> > +
> > + When there is a dominating read from memory on stack,
> > + 1) if x = a is a memory read, return true.
> > + 2) if x = a is a memory write, return true if the memory is on stack.
> > + This is the guarantee the memory is *not* readonly. */
> > +
> > +static bool
> > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> > + const_rtx x, bool is_store) {
> > + rtx_insn *insn;
> > + rtx set;
> > + bool find_load = false;
> > +
> > + gcc_assert (MEM_P (x));
> > +
> > + FOR_BB_INSNS (bb, insn)
> > + {
> > + set = single_set (insn);
> > + if (!set)
> > + continue;
> > +
> > + /* Dominating store */
> > + if (rtx_equal_p (x, SET_DEST (set)))
> > + return true;
> > +
> > + /* Dominating load */
> > + if (rtx_equal_p (x, SET_SRC (set)))
> > + find_load = true;
> > + }
> > +
> > + if (find_load)
> > + {
> > + if (is_store && noce_mem_is_on_stack (a_insn, x))
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +/* Return false if the memory a or b must be valid.
> > + This function must be called before latent swap of a and b. */
> > +
> > +static bool
> > +noce_mem_maybe_invalid_p (struct noce_if_info *if_info) {
> > + /* for memory load */
> > + if (if_info->a && MEM_P (if_info->a))
> > + {
> > + return !noce_valid_for_dominating (if_info->test_bb,
> > + if_info->insn_a,
> > + if_info->a, false);
> > + }
> > +
> > + /* for memory store */
> > + else if (if_info->b && MEM_P (if_info->b))
> > + {
> > + if (!if_info->else_bb)
> > + return !noce_valid_for_dominating (if_info->test_bb,
> > + if_info->insn_a,
> > + if_info->b, true);
> > + else
> > + return !noce_valid_for_dominating (if_info->test_bb,
> > + if_info->insn_b,
> > + if_info->b, true);
> > + }
> > +
> > + /* ??? We could handle this if we knew that a load from A or B could
> > + not trap or fault. This is also true if we've already loaded
> > + from the address along the path from ENTRY. */
> > + return (may_trap_or_fault_p (if_info->a)
> > + || may_trap_or_fault_p (if_info->b)); }
> > +
> > /* Try more complex cases involving conditional_move. */
> >
> > static int
> > @@ -2065,10 +2173,7 @@
> > is_mem = 1;
> > }
> >
> > - /* ??? We could handle this if we knew that a load from A or B could
> > - not trap or fault. This is also true if we've already loaded
> > - from the address along the path from ENTRY. */
> > - else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > + else if (noce_mem_maybe_invalid_p (if_info))
> > return FALSE;
> >
> > /* if (test) x = a + b; else x = c - d; @@ -2234,7 +2339,7 @@
> > /* If insn to set up A clobbers any registers B depends on, try to
> > swap insn that sets up A with the one that sets up B. If even
> > that doesn't help, punt. */
> > - if (modified_in_a && !modified_in_b)
> > + if (!modified_in_a && modified_in_b)
> > {
> > if (!noce_emit_bb (emit_b, else_bb, b_simple))
> > goto end_seq_and_fail;
> > @@ -2242,7 +2347,7 @@
> > if (!noce_emit_bb (emit_a, then_bb, a_simple))
> > goto end_seq_and_fail;
> > }
> > - else if (!modified_in_a)
> > + else if (!modified_in_b)
> > {
> > if (!noce_emit_bb (emit_a, then_bb, a_simple))
> > goto end_seq_and_fail;
> > @@ -5347,6 +5452,30 @@
> >
> > return FALSE;
> > }
> > +
> > +static void
> > +collect_all_fp_insns (void)
> > +{
> > + rtx_insn *a_insn;
> > + bba_sets_sfp = BITMAP_ALLOC (®_obstack);
> > + df_ref def;
> > + basic_block bb;
> > +
> > + FOR_ALL_BB_FN (bb, cfun)
> > + FOR_BB_INSNS (bb, a_insn)
> > + {
> > + rtx sset_a = single_set (a_insn);
> > + if (!sset_a)
> > + continue;
> > +
> > + if (fixed_base_plus_p (SET_SRC (sset_a)))
> > + {
> > + FOR_EACH_INSN_DEF (def, a_insn)
> > + bitmap_set_bit (bba_sets_sfp, DF_REF_REGNO (def));
> > + }
> > + }
> > +}
> > +
> >
> >
> > /* Main entry point for all if-conversion. AFTER_COMBINE is true if
> > we are after combine pass. */
> > @@ -5381,6 +5510,8 @@
> >
> > df_set_flags (DF_LR_RUN_DCE);
> >
> > + collect_all_fp_insns ();
> > +
> > /* Go through each of the basic blocks looking for things to convert. If we
> > have conditional execution, we make multiple passes to allow us to
> handle
> > IF-THEN{-ELSE} blocks within other IF-THEN{-ELSE} blocks. */ @@
> > -5413,6 +5544,8 @@
> > }
> > while (cond_exec_changed_p);
> >
> > + BITMAP_FREE (bba_sets_sfp);
> > +
> > #ifdef IFCVT_MULTIPLE_DUMPS
> > if (dump_file)
> > fprintf (dump_file, "\n\n========== no more changes\n");
> > Index: gcc/rtl.h
> >
> ================================================================
> ===
> > --- gcc/rtl.h (revision 268256)
> > +++ gcc/rtl.h (working copy)
> > @@ -3751,6 +3751,8 @@
> > #define hard_frame_pointer_rtx (global_rtl[GR_HARD_FRAME_POINTER])
> > #define arg_pointer_rtx (global_rtl[GR_ARG_POINTER])
> >
> > +extern bool fixed_base_plus_p (rtx x);
> > +
> > #ifndef GENERATOR_FILE
> > /* Return the attributes of a MEM rtx. */ static inline const
> > struct mem_attrs *
> > Index: gcc/rtlanal.c
> >
> ================================================================
> ===
> > --- gcc/rtlanal.c (revision 268256)
> > +++ gcc/rtlanal.c (working copy)
> > @@ -341,6 +341,30 @@
> > return 0;
> > }
> >
> > +/* Nonzero if X has the form (PLUS frame-pointer integer). */
> > +
> > +bool
> > +fixed_base_plus_p (rtx x)
> > +{
> > + switch (GET_CODE (x))
> > + {
> > + case REG:
> > + if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> > + return true;
> > + if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> > + return true;
> > + return false;
> > +
> > + case PLUS:
> > + if (!CONST_INT_P (XEXP (x, 1)))
> > + return false;
> > + return fixed_base_plus_p (XEXP (x, 0));
> > +
> > + default:
> > + return false;
> > + }
> > +}
> > +
> > /* Compute an approximation for the offset between the register
> > FROM and TO for the current function, as it was at the start
> > of the routine. */
> >
> > Thanks,
> > -Jiangning
More information about the Gcc-patches
mailing list