This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [RFC, ivopts] fix bugs in ivopts address cost computation

From: Jiangning Liu <liujiangning1 at gmail dot com>
To: Sandra Loosemore <sandra at codesourcery dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Fri, 6 Jul 2012 10:43:54 +0800
Subject: Re: [RFC, ivopts] fix bugs in ivopts address cost computation
References: <4FCE3512.7070607@codesourcery.com>
Hi,

For the following code change,

@@ -4212,11 +4064,6 @@ get_computation_cost_at (struct ivopts_d
     cost.cost += adjust_setup_cost (data,
 				    add_cost (TYPE_MODE (ctype), speed));

-  /* Having offset does not affect runtime cost in case it is added to
-     symbol, but it increases complexity.  */
-  if (offset)
-    cost.complexity++;
-
   cost.cost += add_cost (TYPE_MODE (ctype), speed);

   aratio = ratio > 0 ? ratio : -ratio;

I think this shouldn't be removed. The offset may be affected by the
position of inserting reduction variable accumulation statement. There
will be different cases between before and after reduction variable
accumulation. The cost of replacing use point with reduction variable
should be different accordingly.

BTW, I personally think the current ivopt cost modelling basically
works fine, although there might be some tunings needed. The most
difficult part is the choice of reduction variable candidates has
something to do with register pressure cost, while the register cost
estimate is not accurate enough at this stage because we don't have
back-end live range interference graph at all. we are always able to
find holes on some particular cases or benchmarks, but we can't only
want to find a optimal result for them, and the tuning needs to be
backed by more comprehensive result.

Thanks,
-Jiangning

2012/6/6 Sandra Loosemore <sandra@codesourcery.com>:
> My colleagues and I have been working on the GCC port for the Qualcomm
> Hexagon.  Along the way I noticed that we were getting poor results
> from the ivopts pass no matter how we adjusted the target-specific RTX
> costs.  In many cases ivopts was coming up with candidate use costs
> that seemed completely inconsistent with the target cost model.  On
> further inspection, I found what appears to be a whole bunch of bugs
> in the way ivopts is computing address costs:
>
> (1) While the address cost computation is assuming in some situations
> that pre/post increment/decrement addressing will be used if
> supported by the target, it isn't actually using the target's address
> cost for such forms -- instead, just the cost of the form that would
> be used if autoinc weren't available/applicable.
>
> (2) The computation to determine which multiplier values are supported
> by target addressing modes is constructing an address rtx of the form
> (reg * ratio) to do the tests.  This isn't a valid address RTX on
> Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio)
> are.  Because it's choosing the wrong address form to probe with, it
> thinks that the target doesn't support multipliers at all and is
> incorrectly tacking on an extra cost for them.  I also note that it's
> assuming that the same set of ratios are supported by all three
> address forms that can potentially include them, and that all valid
> ratios have the same cost.
>
> (3) The computation to determine the range of valid constant offsets
> for address forms that can include them is probing the upper end of
> the range using constants of the form ((1<<n) - 1).  On Hexagon, the
> offsets have to be aligned appropriately for the mode, so it's
> incorrectly rejecting all positive offsets for non-char modes.  And
> again, it's assuming that the same range of offsets are supported by
> all address forms that can legitimately include them, and that all
> valid offsets have the same cost.  The latter isn't true on Hexagon.
>
> (4) The cost adjustment for converting the symbol_present address to a
> var_present address seems overly optimistic in assuming that the
> symbol load will be hoisted outside the loop.  I looked at a lot of
> code where this was not happening no matter how expensive I made the
> absolute addressing forms in the target-specific costs.  Also, if
> subsequent passes actually do hoist the symbol load, this requires an
> additional register to be available over the entire loop, which ivopts
> isn't accounting for in any way.  It seems to me that this adjustment
> shouldn't be attempted when the symbol_present form is a legitimate
> address (because subsequent passes are not doing the anticipated
> optimization in that case).  There's also a bug present in the cost
> accounting: it's adding the full cost of the symbol + var addition,
> whereas it should be pro-rated across iterations (see the way this is
> handled for the non-address cases in get_computation_cost_at).
>
> (5) The documentation of TARGET_ADDRESS_COST says that when two
> (legitimate) address expressions have the same lowest cost, the one
> with the higher complexity is used.  But, the ivopts code is doing the
> opposite of this and choosing the lower complexity as the tie-breaker.
> (Actually, it's also computing the complexity without regard to
> whether the address rtx is even legitimate on the target.)
>
> (6) The way get_address_costs is precomputing and caching a complete
> set of cost data for each addressing mode seems incorrect to me, in
> the general case.  For example, consider MIPS where MIPS16 and MIPS32
> functions can be freely intermixed in the same compilation unit.  On
> Hexagon, as I've noted, the assumption that all valid multiplier and
> offset values have the same cost is also invalid.  The current code is
> already precomputing 16 sets of address costs for each mode, plus
> probing 128 addresses with multipliers for validity, and similarly
> probing up to 32 or so constant offsets for validity, so I'm pretty
> skeptical that precomputing and caching even more permutations would
> be worthwhile.
>
> (7) If the computed address cost turns out to be 0, the current code
> (for some unknown reason) is turning that into 1, which can screw up
> the relative costs of address computations vs other operations like
> addition.
>
> I've come up with the attached patch to try to fix these things.  The
> biggest change is that I have discarded the code for precomputing and
> caching costs and instead go straight to querying the target back end
> for the cost for the specific address computation we're handed; this
> makes the code a lot simpler.  I would kind of like to get rid of
> multiplier_allowed_in_address_p too, but it's being used in a couple
> places other than the address computation and it seemed better not to
> mess with that for now.  The other fixes are pretty straightforward.
>
> I bootstrapped and regression-tested the patch on x86_64.  I haven't
> tried to benchmark the performance effect of the patch on anything
> other than Hexagon; there I found that, once ivopts actually started
> paying attention to the target address costs function, it needed to be
> re-tuned.  So, it wouldn't surprise me if other back ends could
> benefit from some tweaking as well, depending on the extent to which
> they're affected by the bugs I listed above.
>
> Comments, complaints, proposals for alternate fixes, etc?  Or OK to
> commit?
>
> -Sandra
>
>
> 2012-06-05  Sandra Loosemore <sandra@codesourcery.com>
>
>         gcc/
>         * tree-ssa-loop-ivopts.c (comp_cost): Make complexity field signed.
>         Update comments to indicate this is for addressing mode complexity.
>         (new_cost): Make signedness of parameters match comp_cost fields.
>         (compare_costs): Prefer higher complexity, not lower, per documentation
>         of TARGET_ADDRESS_COST.
>         (multiplier_allowed_in_address_p): Use (+ (* reg1 ratio) reg2) to
>         probe for valid ratios, rather than just (* reg1 ratio).
>         (get_address_cost): Rewrite to eliminate precomputation and caching.
>         Use target's address cost for autoinc forms if possible.  Only attempt
>         sym_present -> var_present cost conversion if the sym_present form
>         is not legitimate; amortize setup cost over loop iterations.
>         Adjust complexity computation.
>         (get_computation_cost_at): Adjust call to get_address_cost.  Do not
>         mess with complexity for non-address expressions.
>         (determine_use_iv_cost_address): Initialize can_autoinc.
>         (autoinc_possible_for_pair): Likewise.
>



-- 
Thanks,
-Jiangning
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]