This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH 1/n] Add conditional compare support


Hi,

The patch is updated according to the comments. Changes include:
* Rewrite codes according to Richard Biener's comments.
* Change the algorithm to recursively combine compares. So it can handle any number of compares.
* Add a set of instruction patterns in ARM backend to match the conditional compares.

With this patch, the conditional compare sequence is like

     CC1 = CCMP (CMP (a, b), CMP (c, d));
     CC2 = CCMP (NE (CC1, 0), CMP (e, f));
     ...
     CCn/reg = CCMP (NE (CCn-1, 0), CMP (...));

To test more than two compares, you need the patch discussed in the thread.
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg63743.html

Bootstrap and no make check regression for both ARM and THUMB modes on ARM Chromebook.

ChangeLog:
2013-10-22  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

	* config/arm/arm.c (arm_fixed_condition_code_regs, arm_ccmode_to_code,
	arm_select_dominance_ccmp_mode): New functions.
	(arm_select_dominance_cc_mode_1): New function extracted from
	arm_select_dominance_cc_mode.
	(arm_select_dominance_cc_mode): Call arm_select_dominance_cc_mode_1.
	* config/arm/arm.md (ccmp, cbranchcc4, ccmp_and, ccmp_ior,
	ccmp_ior_scc_scc, ccmp_ior_scc_scc_cmp, ccmp_and_scc_scc,
	ccmp_and_scc_scc_cmp): New.
	* config/arm/arm-protos.h (arm_select_dominance_ccmp_mode): New.
	* expr.c (ccmp_candidate_p, used_in_cond_stmt_p, expand_ccmp_expr_2,
	expand_ccmp_expr_3, expand_ccmp_expr_1, expand_ccmp_expr): New.
	(expand_expr_real_1): Handle ccmp.
	* optabs.c: Include gimple.h.
	(expand_ccmp_op): New.
	(get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR.
	* optabs.def (ccmp): New.
	* optabs.h (expand_ccmp_op): New.
	* doc/md.texi (ccmp): New index.

Thanks!
-Zhenqiang

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Zhenqiang Chen
> Sent: Monday, September 23, 2013 2:50 PM
> To: Richard Earnshaw
> Cc: 'Richard Biener'; GCC Patches
> Subject: RE: [PATCH 1/n] Add conditional compare support
> 
> 
> 
> > -----Original Message-----
> > From: Richard Earnshaw
> > Sent: Thursday, September 19, 2013 5:13 PM
> > To: Zhenqiang Chen
> > Cc: 'Richard Biener'; GCC Patches
> > Subject: Re: [PATCH 1/n] Add conditional compare support
> >
> > On 18/09/13 10:45, Zhenqiang Chen wrote:
> > >
> > >> -----Original Message-----
> > >> From: Richard Biener [mailto:richard.guenther@gmail.com]
> > >> Sent: Tuesday, August 27, 2013 8:18 PM
> > >> To: Richard Earnshaw
> > >> Cc: Zhenqiang Chen; GCC Patches
> > >> Subject: Re: [PATCH 1/n] Add conditional compare support
> > >>
> > >> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw
> > >> <rearnsha@arm.com>
> > >> wrote:
> > >>> On 27/08/13 12:10, Richard Biener wrote:
> > >>>> What's this for and what's the desired semantics? I don't like
> > >>>> having extra tree codes for this.  Is this for a specific
> > >>>> instruction set feature?
> > >>>
> > >>> The background is to support the conditional compare instructions
> > >>> in ARM (more effectively) and AArch64 at all.
> > >>>
> > >>> The current method used in ARM is to expand into a series of
> > >>> store-flag instructions and then hope that combine can optimize
> > >>> them away (though that fails far too often, particularly when the
> > >>> first instruction in the sequence is combined into another
> > >>> pattern).  To make it work at all the compiler has to lie about
> > >>> the costs of various store-flag type operations which overall
> > >>> risks producing worse code and means we also have to support many
> > >>> more complex multi-instruction patterns than is desirable.  I
> > >>> really don't want to go down the same
> > > route
> > >> for AArch64.
> > >>>
> > >>> The idea behind all this is to capture potential conditional
> > >>> compare operations early enough in the mid end that we can keep
> > >>> track of them until RTL expand time and then to emit the correct
> > >>> logic on all targets depending on what is the best thing for that
> > >>> target.  The current method of lowering into store-flag sequences
> > >>> doesn't really cut
> > > it.
> > >>
> > >> It seems to me that then the initial instruction selection process
> > >> (aka
> > > RTL
> > >> expansion) needs to be improved.  As we are expanding with having
> > >> the CFG around it should be easy enough to detect AND/ORIF cases
> > >> and do better here.  Yeah, I suppose this asks to turn existing
> > >> jump expansion
> > > optimizations
> > >> up-side-down to optimize with the GIMPLE CFG in mind.
> > >>
> > >> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly
> bogus
> > -
> > >> fold-const.c is way too early to decide this.  Similar to the
> > >> ongoing work
> > > of
> > >> expanding / building-up switch expressions in a GIMPLE pass, moving
> > >> expand complexity up the pipeline this asks for a GIMPLE phase that
> > >> moves this decision down closer to RTL expansion.
> > >> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform
> > >> pass)
> > >>
> > >
> > > The patch is updated according to your comments. It is a basic
> > > support, which does not touch ifcombine and jump related
> > > optimizations
> > yet.
> > >
> > > Current method is:
> > > 1) In fold-const, if HAVE_conditional_compare, set
> > > LOGICAL_OP_NON_SHORT_CIRCUIT
> > >    to optimize_function_for_speed_p. So we do not depend on
> > BRANCH_COST.
> > > 2) Identify CCMP during expanding. A CCMP candidate is a
> BIT_AND_EXPR
> > >    or BIT_IOR_EXPR, whose operators are compares.
> > > 3) Add a new op in optab to expand the CCMP to optimized RTL,
> > >     e.g. and_scc_scc/ior_scc_scc in ARM.
> > >
> > > Bootstrap on ARM Chrome book.
> > > No make check regression on Pandaboard.
> > >
> > > Thanks!
> > > -Zhenqiang
> > >
> > > ChangeLog:
> > > 2013-09-18  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
> > >
> > > 	* config/arm/arm.md (conditional_compare): New.
> > > 	* expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr):
> > > New.
> > > 	(expand_expr_real_1): Identify conditional compare.
> > > 	* fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update.
> > > 	* optabs.c (expand_ccmp_op): New.
> > > 	(get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR.
> > > 	* optabs.def (ccmp_optab): New.
> > > 	* optabs.h (expand_ccmp_op): New.
> > >
> > >
> > > basic-conditional-compare-support2.patch
> > >
> > >
> > > NÂnârÂÂÃÃ)emÃhÃyhiÃÂÂw^âÂÃ
> > >
> >
> > Some general comments.
> >
> > 1) How do we get to a conditional branch from this code.  It seems
> > your new pattern generates a store-flag value rather than a branch
> expansion.
> >  Are you expecting combine or some other later pass to clean that up?
> 
> The patch still depend on combine pass to optimize the cmp to generate a
> "cmp_ior/cmp_and".
> 
> It is not necessary a branch. e.g. "return a && b;" is not a branch when
> expanding.
> 
> The attached patch is updated to distinguish the two cases. If the CCMP is
> used in a GIMPLE_COND statement, it generates instruction to match
> "cmp_and/cmp_ior". An new expand "cbranchcc4" is added to match the
> branch instruction.  Then, it does not depend on later passes to optimize the
> CCMP.
> 
> > 2) Assuming we do end up with branches, why would we not want to do
> > this optimization when optimzing for space?
> 
> I mis-checked http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43920, which
> was only for THUMB mode. The attached patch removed it.
> 
> > 	cmp	r0, r1
> > 	cmpne	r2, r3
> > 	beq	L1
> >
> > is shorter than
> >
> > 	cmp	r0, r1
> > 	beq	L1
> > 	cmp	r2, r3
> > 	beq	L1
> 
> > 3) Is there any way to generalize this for more complex expressions?
> > Eg
> >
> > 	if (a == b || a == c || a == d)
> >
> > should become
> >
> > 	cmp	a, b
> > 	cmpne	a, c
> > 	cmpne	a, d
> > 	...
> >
> > Obviously, when optimizing for speed there will probably be a limit on
> > the number of compares that are desirable, but I don't see why we
> > should arbitrarily cap it at 2.
> 
> Based on current gcc, there are "no more than 2" cmps. Need update front-
> end or ifcombine to recover it. I planned to enhance the ifcombine to
> identify more CCMP opportunity. When we have more than 2 cmps, it is not
> hard to support it based on current patch. E.g.
> 
>  (a == b || a == c || a == d)
> 
> will create two ccmp:
> 
> CC_DEQ = CCMP (ior (EQ (a, b), EQ (a, c))) CC_DEQ = CCMP (ior (EQ (CC_DEQ,
> 0), EQ (a, d)))
> 
> The first CCMP will generate
> cmp    a, b
> cmpne a, c
> 
> The second CCMP will only generate
> cmpne a, d
> CC_DEQ is used for determine the condition (ne).
> 
> And from the MODE of CC_DEQ, we can distinguish the two CCMP.
> 
> BTW:
> How to determine the max number? It seams not necessary better with
> more conditional compare.
> 
> Thanks!
> -Zhenqiang

Attachment: ccmp.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]