This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops


On 08/10/2017 03:14 PM, Michael Collison wrote:
> Hi all,
> 
> One issue that we keep encountering on aarch64 is GCC not making good use of the flag-setting arithmetic instructions
> like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the result against zero.
> They are represented in a fairly standard way in the backend as PARALLEL patterns:
> (parallel [(set (reg x1) (op (reg x2) (reg x3)))
>            (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])
> 
> GCC isn't forming these from separate arithmetic and comparison instructions as aggressively as it could.
> A particular pain point is when the result of the arithmetic insn is used before the comparison instruction.
> The testcase in this patch is one such example where we have:
> (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>         (plus:SI (reg:SI 0 x0 [ x ])
>             (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
>      (nil))
> (insn 33 7 34 2 (set (reg:SI 1 x1 [77])
>         (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>             (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
>      (nil))
> (insn 34 33 17 2 (set (reg:CC 66 cc)
>         (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>             (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
>      (nil))
> 
> This scares combine away as x0 is used in insn 33 as well as the comparison in insn 34.
> I think the compare-elim pass can help us here.
Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.

> 
> This patch extends it by handling comparisons against zero, finding the defining instruction of the compare
> and merging the comparison with the defining instruction into a PARALLEL that will hopefully match the form
> described above. If between the comparison and the defining insn we find an instruction that uses the condition
> registers or any control flow we bail out, and we don't cross basic blocks.
> This simple technique allows us to catch cases such as this example.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 	    Michael Collison <michael.collison@arm.com>
> 
> 	* compare-elim.c: Include emit-rtl.h.
> 	(can_merge_compare_into_arith): New function.
> 	(try_validate_parallel): Likewise.
> 	(try_merge_compare): Likewise.
> 	(try_eliminate_compare): Call the above when no previous clobber
> 	is available.
> 	(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
> 	dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 	    Michael Collison <michael.collison@arm.com>
> 	    
> 	* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
> 
> 
> pr5198v1.patch
> 
> 
> diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
> index 7e557a2..c65d155 100644
> --- a/gcc/compare-elim.c
> +++ b/gcc/compare-elim.c
> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm_p.h"
>  #include "insn-config.h"
>  #include "recog.h"
> +#include "emit-rtl.h"
>  #include "cfgrtl.h"
>  #include "tree-pass.h"
>  #include "domwalk.h"
> @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn *start)
>    return reg;
>  }
>  
> +/* Return true if it is okay to merge the comparison CMP_INSN with
> +   the instruction ARITH_INSN.  Both instructions are assumed to be in the
> +   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
> +   that there are no uses or defs of the condition flags or control flow
> +   changes between the two instructions.  */
> +
> +static bool
> +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
> +{
> +  for (rtx_insn *insn = PREV_INSN (cmp_insn);
> +       insn && insn != arith_insn;
> +       insn = PREV_INSN (insn))
> +    {
> +      if (!NONDEBUG_INSN_P (insn))
> +	continue;
> +      /* Bail if there are jumps or calls in between.  */
> +      if (!NONJUMP_INSN_P (insn))
> +	return false;
> +
> +      df_ref ref;
> +      /* Find a USE of the flags register.  */
> +      FOR_EACH_INSN_USE (ref, insn)
> +	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +	  return false;
> +
> +      /* Find a DEF of the flags register.  */
> +      FOR_EACH_INSN_DEF (ref, insn)
> +	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +	  return false;
> +    }
> +  return true;
> +}
What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.

Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?


> +
> +/* Given two SET expressions, SET_A and SET_B determine whether they form
> +   a recognizable pattern when emitted in parallel.  Return that parallel
> +   if so.  Otherwise return NULL.  This tries both permutations of SET_A
> +   and SET_B within the PARALLEL.  */
> +
> +static rtx
> +try_validate_parallel (rtx set_a, rtx set_b)
> +{
> +  rtx par
> +    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
> +  rtx_insn *seq;
> +  start_sequence ();
> +  seq = emit_insn (par);
> +  end_sequence ();
> +  if (recog_memoized (seq) > 0)
> +    return par;
> +
> +  par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_b, set_a));
> +  start_sequence ();
> +  seq = emit_insn (par);
> +  end_sequence ();
> +  return recog_memoized (seq) > 0 ? par : NULL_RTX;
> +}
I don't think you need to build up a sequence here.   Can't you just
allocate the INSN container and set its PATTERN?

> +
> +/* For a comparison instruction described by CMP check if it compares a
> +   register with zero i.e. it is of the form CC := CMP R1, 0.
> +   If it is, find the instruction defining R1 (say I1) and try to create a
> +   PARALLEL consisting of I1 and the comparison, representing a flag-setting
> +   arithmetic instruction.  Example:
> +   I1: R1 := R2 + R3
> +   <instructions that don't read the condition register>
> +   I2: CC := CMP R1 0
> +   I2 can be merged with I1 into:
> +   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
> +   This catches cases where R1 is used between I1 and I2 and therefore
> +   combine and other RTL optimisations will not try to propagate it into
> +   I2.  Return true if we succeeded in merging CMP.  */
> +
> +static bool
> +try_merge_compare (struct comparison *cmp)
> +{
> +  rtx_insn *cmp_insn = cmp->insn;
> +
> +  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
> +    return false;
> +  rtx in_a = cmp->in_a;
> +  if (!in_a)
> +    return false;
Isn't the second IF redundant?  How can !in_a ever be true here given we
tested REG_P (cmp->in_a)?


> +  df_ref use;
> +
> +  FOR_EACH_INSN_USE (use, cmp_insn)
> +    if (DF_REF_REGNO (use) == REGNO (in_a))
> +      break;
> +  if (!use)
> +    return false;
> +
> +  struct df_link *ref_chain;
> +  ref_chain = DF_REF_CHAIN (use);
> +  if (!ref_chain || !ref_chain->ref
> +      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
> +    return false;
So what are you checking for here?  I've got a pretty good guess, but
let's just make it explicit in a comment.


Generally this looks pretty close.  THe biggest concern I see is I think
you need to verify that the inputs on the arthmetic insn don't change
between the arithmetic and the compare.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]