[SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

Richard Sandiford richard.sandiford@arm.com
Tue Jun 25 14:35:00 GMT 2019


Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> >
>> > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
>> > >    if (!def_set)
>> > >      return false;
>> > >
>> > > +  if (reg_prop_only
>> > > +      && !REG_P (SET_SRC (def_set))
>> > > +      && !REG_P (SET_DEST (def_set)))
>> > > +    return false;
>> >
>> > This should be:
>> >
>> >   if (reg_prop_only
>> >       && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set))))
>> >     return false;
>> >
>> > so that we return false if either operand isn't a register.
>> Oops, sorry about that  -:(
>> >
>> > > +
>> > > +  /* Allow propagations into a loop only for reg-to-reg copies, since
>> > > +     replacing one register by another shouldn't increase the cost.  */
>> > > +
>> > > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
>> > > +      && !REG_P (SET_SRC (def_set))
>> > > +      && !REG_P (SET_DEST (def_set)))
>> > > +    return false;
>> >
>> > Same here.
>> >
>> > OK with that change, thanks.
>> Thanks for the review, will make the changes and commit the patch
>> after re-testing.
> Hi,
> Testing the patch showed following failures on 32-bit x86:
>
>   Executed from: g++.target/i386/i386.exp
>     g++:g++.target/i386/pr88152.C   scan-assembler-not vpcmpgt|vpcmpeq|vpsra
>   Executed from: gcc.target/i386/i386.exp
>     gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs:
>     gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t
> ]*\\%eax,[\\t ]*%eax 1
>
> The failure of pr88152.C is also seen on x86_64.
>
> For pr66768.c, and pr90178.c, forwprop replaces register which is
> volatile and frame related respectively.
> To avoid that, the attached patch, makes a stronger constraint that
> src and dest should be a register
> and not have frame_related or volatil flags set, which is checked in
> usable_reg_p().
> Which avoids the failures for both the cases.
> Does it look OK ?

That's not the reason it's a bad transform.  In both cases we're
propagating r2 <- r1 even though

(a) r1 dies in the copy and
(b) fwprop can't replace all uses of r2, because some have multiple
    definitions

This has the effect of making both values live in cases where only one
was previously.

In the case of pr66768.c, fwprop2 is undoing the effect of
cse.c:canon_reg, which tries to pick the best register to use
(see cse.c:make_regs_eqv).  fwprop1 makes the same change,
and made it even before the patch, but the cse.c choice should win.

A (hopefully) conservative fix would be to propagate the copy only if
both registers have a single definition, which you can test with:

  (DF_REG_DEF_COUNT (regno) == 1
   && !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno))

In that case, fwprop should see all uses of the destination, and should
be able to replace it in all cases with the source.

> For g++.target/i386/pr88152.C, the issue is that after the patch,
> forwprop1 does following propagation (in f10) which wasn't done
> before:
>
> In insn 10, replacing
>  (unspec:SI [
>             (reg:V2DF 91)
>         ] UNSPEC_MOVMSK)
>  with (unspec:SI [
>             (subreg:V2DF (reg:V2DI 90) 0)
>         ] UNSPEC_MOVMSK)
>
> This later defeats combine because insn 9 gets deleted.
> Without patch, the following combination takes place:
>
> Trying 7 -> 9:
>     7: r90:V2DI=r89:V2DI>r93:V2DI
>       REG_DEAD r93:V2DI
>       REG_DEAD r89:V2DI
>     9: r91:V2DF=r90:V2DI#0
>       REG_DEAD r90:V2DI
> Successfully matched this instruction:
> (set (subreg:V2DI (reg:V2DF 91) 0)
>     (gt:V2DI (reg:V2DI 89)
>         (reg:V2DI 93)))
> allowing combination of insns 7 and 9
>
> and then:
> Trying 6, 9 -> 10:
>     6: r89:V2DI=const_vector
>     9: r91:V2DF#0=r89:V2DI>r93:V2DI
>       REG_DEAD r89:V2DI
>       REG_DEAD r93:V2DI
>    10: r87:SI=unspec[r91:V2DF] 43
>       REG_DEAD r91:V2DF
> Successfully matched this instruction:
> (set (reg:SI 87)
>     (unspec:SI [
>             (lt:V2DF (reg:V2DI 93)
>                 (const_vector:V2DI [
>                         (const_int 0 [0]) repeated x2
>                     ]))
>         ] UNSPEC_MOVMSK))

Eh?  lt:*V2DF*?  Does that mean that it's 0 for false and an all-1 NaN
for true?

Looks like a bug that we manage to fold to that, and manage to match it.

Thanks,
Richard

> allowing combination of insns 6, 9 and 10
> original costs 4 + 8 + 4 = 16
> replacement cost 12
> deferring deletion of insn with uid = 9.
> deferring deletion of insn with uid = 6.
> which deletes insns 2, 3, 6, 7, 9.
>
> With patch, it fails to combine 7->10:
> Trying 7 -> 10:
>     7: r90:V2DI=r89:V2DI>r93:V2DI
>       REG_DEAD r93:V2DI
>       REG_DEAD r89:V2DI
>    10: r87:SI=unspec[r90:V2DI#0] 43
>       REG_DEAD r90:V2DI
> Failed to match this instruction:
> (set (reg:SI 87)
>     (unspec:SI [
>             (subreg:V2DF (gt:V2DI (reg:V2DI 89)
>                     (reg:V2DI 93)) 0)
>         ] UNSPEC_MOVMSK))
>
> and subsequently 6, 7 -> 10
> (attached combine dumps before and after patch).
>
> So IIUC, the issue is that the target does not have a pattern that can
> match the above insn ?
> I tried a simple workaround to "pessimize" the else condition in
> propagate_rtx_1 in patch, to require old_rtx and new_rtx have same
> rtx_code, which at least
> works for this test-case, but not sure if that's the correct approach.
> Could you suggest how to proceed ?
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh
>> >
>> > Richard



More information about the Gcc-patches mailing list