[PATCH] Disable loop2_invariant for -Os

Zhenqiang Chen zhenqiang.chen@arm.com
Mon Jul 9 08:40:00 GMT 2012


>>
>> 1) If -fira_loop_pressure is enabled, it reduces ~24% invariant motions in my
>tests. But it does not help on total code size. Seams there is issue to update the
>"regs_needed" after moving an invariant out of the loop (My benchmark logs
>show ~73% cases have more than one invariants moved).
>>
>> During tracing, I found that move an integer constant out of the loop does not
>increase regs_needed. Function "get_pressure_class_and_nregs (rtx insn, int
>*nregs)" computes the "regs_needed".
>>
>>    *nregs
>>       = ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC
>> (set))];
>>
>> In ARM, the insn to set an integer is like
>>      (set (reg:SI 183)
>>         (const_int 32 [0x20])) inv1.c:64 182 {*thumb1_movsi_insn}
>>      (nil))
>> GET_MODE (SET_SRC (set)) is VOIDMode and
>ira_reg_class_max_nregs[pressure_class][VOIDMode] is 0. In one of my test
>cases, it moves 4 integer constants out of the loop, which leads to spilling.
>>
>> According to the algorithm in "calculate_loop_reg_pressure", moving an
>> invariant out of the loop should impact on the register pressure. So I
>> try to add the following code
>>
>>   if (! (*nregs))
>>     *nregs = ira_reg_class_max_nregs[pressure_class][GET_MODE (reg)];
>>
>> Logs show it reduces another 32% invariant motions. But the code size is still
>far from disabling the pass. Logs show -fira_loop_pressure impact other passes
>in addition to loop2_invariant (The result of "-fira_loop_pressure
>-fno-move-loop-invariants" is different from the result of
>"-fno-move-loop-invariants").
>>
>> 2) By default -fira_loop_pressure is not enabled for -Os, the logic to compute
>"regs_used" seams not sound. The following codes is from function
>"find_invariants_to_move"
>>     {
>>       unsigned int n_regs = DF_REG_SIZE (df);
>>
>>       regs_used = 2;
>>
>>       for (i = 0; i < n_regs; i++)
>>         {
>>           if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i))
>>             {
>>               /* This is a value that is used but not changed inside loop.
>*/
>>               regs_used++;
>>             }
>>         }
>>     }
>> * There is no loop related inform in the code.
>> * Benchmark logs show the condition (!DF_REGNO_FIRST_DEF (i) &&
>DF_REGNO_LAST_USE (i)) is never true.
>
>Still there is code that tries to deal with -Os.  Simply disabling the pass makes
>that logic pointless.

If -fira-loop-pressure is not enabled, function estimate_reg_pressure_cost (cfgloopanal.c) is used to estimate the cost. At the beginning of the function, it checks

/* If we have enough registers, we should use them and not restrict
     the transformations unnecessarily.  */
  if (regs_needed + target_res_regs <= available_regs)
    return 0;

Here are the CSiBE benchmark logs before "if (...)" for ARM/MIPS/PPC/X86.

     available_regs target_res_regs regs_needed
ARM : 9                 3              2
MIPS: 10/26             3              2
PPC : 18/29             3              2
X86 : 6/15              3              2

regs_needed++ after invariant motion. The size_cost of the first several invariant (available_regs - target_res_regs(3) - regs_needed(2)) motions are always 0. So I prefer to disable the pass if -fira-loop-pressure is not enabled.

>Thus, please try to fix the code that is there to deal with -Os (a target may opt to
>enable -fira-loop-pressure by default for -Os).

Yes. Targets need tune to enable -fira-loop-pressure.

For -fira-loop-pressure, CSiBE logs show MIPS and PPC have a little improvement and X86 has a little regression compared with -fira-loop-pressure is not enabled.
If fira-loop-pressure is enabled, the cost check bases on

  if ((int) new_regs[pressure_class]
      + (int) regs_needed[pressure_class]
      + LOOP_DATA (curr_loop)->max_reg_pressure[pressure_class]
      + IRA_LOOP_RESERVED_REGS
      > ira_available_class_regs[pressure_class])

But a reg is available does not mean it can be used in any instruction. e.g. For ARM Cortex-M0, only few instructions can use r8-r15. (r8-r11, r13-r15 are already excluded in the available_regs). Logs show the result is much better if r12 is also excluded.

Thanks!
-Zhenqiang





More information about the Gcc-patches mailing list