[PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

Kyrill Tkachov kyrylo.tkachov@arm.com
Fri Jul 10 12:31:00 GMT 2015


Hi all,

This patch makes if-conversion more aggressive when handling code of the form:
if (test)
   x := a  //THEN
else
   x := b  //ELSE

Currently, we can handle this case only if x:=a and x:=b are simple single set instructions.
With this patch we will be able to handle the cases where x:=a and x:=b take multiple instructions.
This can be done under the condition that all the instructions in the THEN and ELSE basic blocks are
only used to compute a value for x.  I suppose we could generalise even further (perhaps to handle
cases where multiple x's are being set) but that's out of the scope of this patch.

This was sparked by some cases in aarch64 where the THEN or ELSE branches contained an extra
zero_extend operation after an arithmetic instruction which prevented if-conversion.

To implement this approach noce_process_if_block in ifcvt.c is relaxed to allow multi-instruction
basic blocks when the intermediate values produced in them don't escape the basic block except
through x.  noce_process_if_block then calls a number of other functions to detect various
patterns and if-convert. Most of them don't actually make sense for multi-instruction basic blocks
so they are updated to reject them and operate only on the existing single-instruction case.

However, noce_try_cmove_arith can take advantage of multi-instruction basic blocks and is thus
updated to emit the whole basic blocks rather than just one instruction.

The transformation is, of course, guarded on a cost calculation.
The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't
exceed the branch cost. I don't think that's quite a right calculation.
We're going to be executing at least one of the basic blocks anyway.
This patch we instead check the *maximum* of the two blocks against the branch cost.
This should still catch cases where a high latency instruction appears in one or both of
the paths.


This transformation applies to targets with conditional move operations but no conditional
execution. Thus, it applies to aarch64 and x86_64, but not arm.

The effect of this patch is more noticeable if the backend branch cost is higher (like you'd expect).


Not increasing the branch cost we still get more aggressive if-conversion.
Across the whole of SPEC2006 I saw a 5.8% increase in the number of csel instructions generated
(from 41242 -> 43637)

Bootstrapped and tested on aarch64, x86_6, arm.
I've made the testcases aarch64-specific since they depend on backend branch costs that are hard
to predict across all platforms (we don't have a -mbranch-cost= option ;))
No performance regressions on SPEC2006 on aarch64 and x86_64.
On aarch64 I've seen 482.sphinx3 improve by 2.3% and 459.GemsFDTD by 2.1%

Some of the testcases in aarch64.exp now fail their scan-assembler patterns due to if-conversion.
I've updated those testcases to properly generate the pattern they expect. The changes are mostly
due to add+compare-style instructions now appearing in the same basic blocks as their result uses,
which, I think, scares combine away from combining them into one.

Does this approach look reasonable?
If so, ok for trunk?

Thanks,
Kyrill


2015-07-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
     then_cost, else_cost fields.
     (end_ifcvt_sequence): Call set_used_flags on each insn in the
     sequence.
     (noce_simple_bbs): New function.
     (noce_try_move): Bail if basic blocks are not simple.
     (noce_try_store_flag): Likewise.
     (noce_try_store_flag_constants): Likewise.
     (noce_try_addcc): Likewise.
     (noce_try_store_flag_mask): Likewise.
     (noce_try_cmove): Likewise.
     (noce_try_minmax): Likewise.
     (noce_try_abs): Likewise.
     (noce_try_sign_mask): Likewise.
     (noce_try_bitop): Likewise.
     (bbs_ok_for_cmove_arith): New function.
     (noce_emit_all_but_last): Likewise.
     (noce_emit_insn): Likewise.
     (noce_emit_bb): Likewise.
     (noce_try_cmove_arith): Handle non-simple basic blocks.
     (insn_valid_noce_process_p): New function.
     (bb_valid_for_noce_process_p): Likewise.
     (noce_process_if_block): Allow non-simple basic blocks
     where appropriate.


2015-07-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/ifcvt_csel_1.c: New test.
     * gcc.target/aarch64/ifcvt_csel_2.c: New test.
     * gcc.target/aarch64/ifcvt_csel_3.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ifcvt.patch
Type: text/x-patch
Size: 19680 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150710/395bbc72/attachment.bin>


More information about the Gcc-patches mailing list