[RFC]How to get more accurate cost of pre-loop calculations in ivopts pass

Fri May 31 08:51:00 GMT 2013

Hi,
During studying ivopt pass, I found the cost of preloop calculations
are inaccurately calculated in many scenarios.

There are two kinds of preloop calculations: base of candidates and
invariant part of iv use representation.
For base of iv candidates, it is calculated as below:

cost_base = force_var_cost (data, base, NULL);
/* It will be exceptional that the iv register happens to be initialized with
     the proper value at no cost.  In general, there will at least be a regcopy
     or a const set.  */
  if (cost_base.cost == 0)
    cost_base.cost = COSTS_N_INSNS (1);
  cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));

  cost = cost_step + adjust_setup_cost (data, cost_base.cost);

The amortization of cost_base over the per-iteration cost results in
bad choice of candidates. Considering below codes generated for ARM:

    mov    r2, #0
    sub    ip, r1, #4
    mov    lr, r2
.L48:
    add    r2, r2, #1
    str    lr, [ip, #4]!
    cmp    r2, #23
    bne    .L48

The sub instruction in pre-header can be saved if ivopt chooses
post-increment address mode, which didn't happen because pre/post
increment candidates have same cost/cost_base after amortization.

I did experiment to keep cost_base information and comparing it when
choosing iv set but did not get obvious change. Also it interferes
with the hypothesis that there always is one regcopy or constant
loading.  Truth is it's hardly to know whether there will be such an
instruction at this stage.

Same issue occurs the invariant part of iv use representation in
get_computation_cost_at, here ivopt just ignores the possible regcopy
or constant loading instruction.

I understand it's difficult to calculate accurate cost at gimple IR,
but many such choices of iv set are observed, especially after
enabling auto-increment and multiplied_address mode on ARM. So here I
send this message for help.
Thanks in advance.

Best Regards.