This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Why out-of-ssa does var coalescing based on name?


Wei Mi <wmi@google.com> wrote:
>For the following case:
>
>float total = 0.2;
>
>int main() {
> int i;
>
> for (i = 0; i < 1000000000; i++) {
>   total += i;
> }
>
> return total == 0.3;
>}
>
>The gcc assembly of its kernel loop is:
>
>.L3:
>       movaps  %xmm0, %xmm1
>.L2:
>       cvtsi2ss        %eax, %xmm0
>       addl    $1, %eax
>       cmpl    $1000000000, %eax
>       addss   %xmm1, %xmm0
>       jne     .L3
>
>The movaps is redundent, the loop could be changed to:
>
>.L3:
>       cvtsi2ss        %eax, %xmm1
>       addl    $1, %eax
>       cmpl    $1000000000, %eax
>       addss   %xmm1, %xmm0
>       jne     .L3
>
>Manually removing the extra movaps improves performance from 1.26s to
>0.95s
>on sandybridge using trunk (r201859).
>
>load PRE tries to promote MEM op of total out of the loop, it generates
>a
>new PHI at the start of loop body:
>
> <bb 2>:
> pretmp_22 = total;
> goto <bb 4>;
>
> <bb 3>:
>
> <bb 4>:
> # i_15 = PHI <i_8(3), 0(2)>
># prephitmp_23 = PHI <total.1_6(3), pretmp_22(2)>       ==> PHI
>generated.
> _4 = (float) i_15;
> total.0_5 = prephitmp_23;
> total.1_6 = _4 + total.0_5;
> total = total.1_6;
> i_8 = i_15 + 1;
> if (i_8 != 1000000000)
>   goto <bb 3>;
> else
>   goto <bb 5>;
>
>out-of-ssa phase should have coalesced prephitmp_23 and total.1_6(3) to
>the
>same temp var, but existing out-of-ssa has a limitation that it will
>not
>coalesce ssa variables with different base var names, even if they are
>in
>the same phi and their live ranges don't conflict. So out-of-ssa will
>insert the redundent mov pretmp = total.1_6 in bb3.
>
> <bb 2>:
> pretmp = total;
> goto <bb 4>;
>
> <bb 3>:
> pretmp = total.1_6;        ==> inserted by out-of-ssa.
>
> <bb 4>:
> _4 = (float) i_15;
> total.1_6 = _4 + pretmp;
> i_8 = i_15 + 1;
> if (i_8 != 1000000000)
>   goto <bb 3>;
> else
>   goto <bb 5>;
>
>IRA phase has the potential to allocate pretmp and total.1_6 to the
>same
>hardreg and remove the extra mov, but for the above case, regmove phase
>happen to block ira from doing the cleanup. regmove guesses the
>register
>constraint of an insn and try to change the insn to satisfy the
>constraint
>before IRA phase. Usually it could help IRA make a better decision, but
>here regmove decides to merge _4 and total.1_6 into total.1_6 in order
>to
>satisfy the constraint of two operand plus on x86 (addss xmm1, xmm2).
>After
>_4 and total.1_6 are merged, The live range of total.1_6 has conflict
>with
>that of pretmp in IRA, so they cannot be allocated to the same hardreg,
>and
>the redundent mov (pretmp = total.1_6) couldn't be deleted. However, It
>is
>not trivial to make regmove choose to merge total.1_6 and pretmp,
>because
>it requires regmove to have global live range analysis (Existing
>regmove
>has simple correctness check in a range limited to single bb).
>
>If we use -mtune=corei7-avx, then the redundent mov disappear. That is
>because after using avx support, regmove knows avx provide three
>operands
>plus: vaddsd xmm1, xmm2, xmm3/m32, so it will not merge total.1_6 and
>_4,
>then IRA could allocate total.1_6 and pretmp to the same hardreg.
>
>If we change the type of total from float to int, then the redundent
>mov
>also disappears. It has similar reason as the above one. x86 provides
>LEA
>insn which could be used as plus op and it could have three operands,
>so
>regmove chooses not to merge total.1_6 and _4.
>
>My question is, why out-of-ssa cannot do the cleanup by coalescing all
>the
>vars without conflicts in the same phi stmt, instead of only coalescing
>the
>vars with the same base name?

The restriction exists to keep conflict bitmaps small. Otherwise you'll have quadratic memory usage for them.

Richard.

>Thanks,
>Wei Mi.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]