This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch tree-optimization 1/2]: Branch-cost optimizations


On Mon, Nov 7, 2011 at 4:53 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
> 2011/11/7 Richard Guenther <richard.guenther@gmail.com>:
>> On Sun, Nov 6, 2011 at 11:12 PM, Kai Tietz <ktietz@redhat.com> wrote:
>>> Hello,
>>>
>>> By this patch branch-cost optimization is moved from tree AST to cfgexpand from gimple to RTL. ?By this we are able to do better optimization on conditionals simliar for all targets and do the final transition for branch-cost that late it shows best effect.
>>>
>>> This patch is splitted up into two pieces. ?First adds feature of BC optimization to cfgexpand and scratches out BC-optimization code from fold-const.
>>>
>>> The second patch adds to tree-ssa-ifcombine pass the feature to merge simple if-and/or-if patterns into associative form.
>>>
>>> Two tests are failing due this patch in C's testsuite. ?This are unit-pred-6_b and uniit-pred-6_c testcases. ?Those failures are caused by jump-threading optimization in vrp, as vrp-pass. ?Those failures could be fixed by the second patch, if we would move the ifcombine pass before the first vrp pass.
>>
>> The idea of doing target cost specific conversion of straight-line conditional
>> code to branchy code and vice versa is a good one. ?Though technically
>> the spot you chose isn't very suitable - we already performed SSA name
>> coalescing so you do not have the freedom that you seem to exercise
>> with using SSA name definition statements.
>
> Well, not that I noticed that I missed here any freedom. ?In what
> cases you mean I would need here freedom to create new ssa-statements?

You lookup SSA_NAME_DEF_STMTs of SSA names - you cannot do
that for SSA names that have been coalesced with others.  Instead
there is get_gimple_for_ssa_name which may return NULL.

> ?I admit that to move this branching code into a late pass before
> cfgexpand would be preferable, but due current way cfgexpand already
> interacts here with dojump, it seems to me for now the better place.

How is it for now the better place when it's the wrong place?

> So for the upcoming 4.8 I will happily work on a more improved
> version, which can handle the switch-lowering, too.
> The current approach I've posted here shows already (first patch only)
> an overall lowering of instructions executed for first condition, and
> a pretty good lowering of executed instruction for extra dynamic
> conditions. ?As the choosen algorithm here isn't pretty aggressive in
> finding matches, there should be indeed much chance to improve it
> more.
>
> The real pain is here already in AST, as for doing associative logical
> bitwise combining we need to know pretty well that condition-elements
> don't trap and have no side-effects. ?This is right now only possible
> by walking full tree, which is (as my tests have shown) even faster
> then I expected.

Well, trapping statements are properly marked and operands never
have side-effects in gimple.  So the situation is very easy - never
collect these kinds of operations in such "affine combination" - you
couldn't do anything with them anyway.

>
>> Rather than hooking the logic into the place where we expand conditions
>> this should be a "lowering" pass right before expansion (that would,
>> hopefully at some point also do switch statement expansion/lowering
>> according to target costs). ?Initially such patch should try to exactly
>> copy what we do during expansion right now.
>
> I agree, this should be the place it should go to for upcoming version.
>
>> The cond_chain stuff should be as we have discussed quite some time
>> ago on IRC - modeled after tree-affine.c - a "vector" of predicates
>> of the form [~] A op B, combined using logical AND or OR
>> (thus, able to represent a disjunctive or conjunctive normal form).
>> All suitably abstracted so if-conversion, the loop optimizer and
>> loop number of iteration analysis can use this code (they all collect
>> a controlling predicate (chain) and try to simplify against it).
>
> Yes, I remember this discussion. ?And I agree that an affine-tree for
> this could help at some places to share same facility. ?Issue here is,
> as we also were discussing, that a affine tree of not and ops for
> bitwise-binaries is for truth-op folding not enough. ?We need to
> handle here comparsions and have to detect trapping, and side-effects.
> As I said above for upcoming 4.8 I will work on that.

See above.  you'd have a vector { [~] A op B, [~] C op D, ... } with
a common combining operation, AND or OR.  A op B would be
for example a comparison.  For a full DNF/CNF you'd have a vector
of those vectors as you need both AND and OR - but I believe that
in most cases a flat combination is what we want to work with
(there is no reason A or B could not be arbirtrarily complex trees,
other than "trees are ugly").

>> A patch adding a pre-expand pass should not at the same time
>> change what we feed it - what we have before such pass in the IL
>> should be driven by choice what canonical form is more optimal
>> for optimization passes (and unfortunately we have conflicting
>> requirements here - think of profile-feedback and the fact that
>> we do not have SSA names for each predicate we compute).
>
> Well, this logic is in the new code for cfgexpand done. It assumes
> that each instruction has cost of 1. ?So it collects the amount of
> required instruction required for the conditional check and sees if it
> might be profitable to join it with second operand (if there is one).

Huh, well, if you are in expansion you should better deal with real
target costs, not an arbitrary 1.

> This logic is much more sensible in terms of executed instruction that
> what we have now in fold-const, where we are blindly joining
> associative bitwise-operations, even if it is more costy then the
> separate condition.

We surely should stop doing this kind of complex transforms in
fold-const.c, especially as that is applied to the frontends AST.

> So by this patch we already have a code-reduction of about 0.3% and
> for each extra-dynamical branch a much higher reduction of executed
> instruction as current approach shows.

On which targets and for which benchmark?

>> I note that there are still more patches from you pending related
>> to predicate optimizations - but I lack an idea where you are
>> heading to, from a global point of view. ?Take the time that
>> stage3 and pre-stage1 offers to lay ground for a more structured
>> approach to this area. ?You may, for example, want to pick up
>> my (suspended) work on separating predicate computation from
>> predicate use (or at least think about whether that's a good idea
>> for the purpose of predicate optimizations).
>
> Well, the idea behind this is, that we use for general bitwise-binary
> optimizations (logical and non-logical) operation one pass which
> already handles it.
> We might should have a reassoc pass much earlier then now.

So, what do you want to canonicalize to?  Your patch seems to
move towards straight-line code which pessimizes passes that
perform jump-threading and conditional value-numbering.  So, why
is that a good idea?

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]