This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH 1/n] Add conditional compare support
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Zhenqiang Chen <zhenqiang dot chen at arm dot com>
- Cc: 'Richard Biener' <richard dot guenther at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 19 Sep 2013 10:12:38 +0100
- Subject: Re: [PATCH 1/n] Add conditional compare support
- Authentication-results: sourceware.org; auth=none
- References: <000001ce9e58$6dadf230$4909d690$ at arm dot com> <CAFiYyc25hLxrAnsOXbfqYo=Rsw_MDopxb66bygfac-CEsffOEw at mail dot gmail dot com> <521C9406 dot 6050901 at arm dot com> <CAFiYyc0zrBBUvRF0Vmv5PyvypfK75QYuc3D9AMTE+Fkh3nCq4Q at mail dot gmail dot com> <000001ceb453$ceaea3c0$6c0beb40$ at arm dot com>
On 18/09/13 10:45, Zhenqiang Chen wrote:
>> -----Original Message-----
>> From: Richard Biener [mailto:email@example.com]
>> Sent: Tuesday, August 27, 2013 8:18 PM
>> To: Richard Earnshaw
>> Cc: Zhenqiang Chen; GCC Patches
>> Subject: Re: [PATCH 1/n] Add conditional compare support
>> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw <firstname.lastname@example.org>
>>> On 27/08/13 12:10, Richard Biener wrote:
>>>> What's this for and what's the desired semantics? I don't like having
>>>> extra tree codes for this. Is this for a specific instruction set
>>> The background is to support the conditional compare instructions in
>>> ARM (more effectively) and AArch64 at all.
>>> The current method used in ARM is to expand into a series of
>>> store-flag instructions and then hope that combine can optimize them
>>> away (though that fails far too often, particularly when the first
>>> instruction in the sequence is combined into another pattern). To
>>> make it work at all the compiler has to lie about the costs of various
>>> store-flag type operations which overall risks producing worse code
>>> and means we also have to support many more complex multi-instruction
>>> patterns than is desirable. I really don't want to go down the same
>> for AArch64.
>>> The idea behind all this is to capture potential conditional compare
>>> operations early enough in the mid end that we can keep track of them
>>> until RTL expand time and then to emit the correct logic on all
>>> targets depending on what is the best thing for that target. The
>>> current method of lowering into store-flag sequences doesn't really cut
>> It seems to me that then the initial instruction selection process (aka
>> expansion) needs to be improved. As we are expanding with having the CFG
>> around it should be easy enough to detect AND/ORIF cases and do better
>> here. Yeah, I suppose this asks to turn existing jump expansion
>> up-side-down to optimize with the GIMPLE CFG in mind.
>> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus -
>> fold-const.c is way too early to decide this. Similar to the ongoing work
>> expanding / building-up switch expressions in a GIMPLE pass, moving expand
>> complexity up the pipeline this asks for a GIMPLE phase that moves this
>> decision down closer to RTL expansion.
>> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform pass)
> The patch is updated according to your comments. It is a basic support,
> which does not touch ifcombine and jump related optimizations yet.
> Current method is:
> 1) In fold-const, if HAVE_conditional_compare, set
> to optimize_function_for_speed_p. So we do not depend on BRANCH_COST.
> 2) Identify CCMP during expanding. A CCMP candidate is a BIT_AND_EXPR
> or BIT_IOR_EXPR, whose operators are compares.
> 3) Add a new op in optab to expand the CCMP to optimized RTL,
> e.g. and_scc_scc/ior_scc_scc in ARM.
> Bootstrap on ARM Chrome book.
> No make check regression on Pandaboard.
> 2013-09-18 Zhenqiang Chen <email@example.com>
> * config/arm/arm.md (conditional_compare): New.
> * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr):
> (expand_expr_real_1): Identify conditional compare.
> * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update.
> * optabs.c (expand_ccmp_op): New.
> (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR.
> * optabs.def (ccmp_optab): New.
> * optabs.h (expand_ccmp_op): New.
Some general comments.
1) How do we get to a conditional branch from this code. It seems your
new pattern generates a store-flag value rather than a branch expansion.
Are you expecting combine or some other later pass to clean that up?
2) Assuming we do end up with branches, why would we not want to do this
optimization when optimzing for space?
cmp r0, r1
cmpne r2, r3
is shorter than
cmp r0, r1
cmp r2, r3
3) Is there any way to generalize this for more complex expressions? Eg
if (a == b || a == c || a == d)
cmp a, b
cmpne a, c
cmpne a, d
Obviously, when optimizing for speed there will probably be a limit on
the number of compares that are desirable, but I don't see why we should
arbitrarily cap it at 2.