[RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

Hongtao Liu crazylht@gmail.com
Mon Jun 28 07:20:27 GMT 2021


On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >
> > Hi!
> >
> > on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
> > > Hi,
> > >
> > > PR100328 has some details about this issue, I am trying to
> > > brief it here.  In the hottest function LBM_performStreamCollideTRT
> > > of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
> > > (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
> > > insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
> > > class have 64 registers whose foregoing 32 ones make up the
> > > whole FLOAT_REG.  There are some differences for these two
> > > flavors, taking "*fma<mode>4_fpr" as example:
> > >
> > > (define_insn "*fma<mode>4_fpr"
> > >   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
> > >       (fma:SFDF
> > >         (match_operand:SFDF 1 "gpc_reg_operand" "%<Ff>,wa,wa")
> > >         (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
> > >         (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa")))]
> > >
> > > // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
> > > // <Ff> (f/d) => A floating point register, aka. FLOAT_REG.
> > >
> > > So for VSX_REG, we only have the destructive form, when VSX_REG
> > > alternative being used, the operand 2 or operand 3 is required
> > > to be the same as operand 0.  reload has to take care of this
> > > constraint and create some non-free register copies if required.
> > >
> > > Assuming one fma insn looks like:
> > >   op0 = FMA (op1, op2, op3)
> > >
> > > The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
> > > IRA simply creates three shuffle copies for them (here the operand
> > > order matters, since with the same freq, the one with smaller number
> > > takes preference), but IMO both op2 and op3 should take higher priority
> > > in copy queue due to the matching constraint.
> > >
> > > I noticed that there is one function ira_get_dup_out_num, which meant
> > > to create this kind of constraint copy, but the below code looks to
> > > refuse to create if there is an alternative which has valid regclass
> > > without spilled need.
> > >
> > >       default:
> > >       {
> > >         enum constraint_num cn = lookup_constraint (str);
> > >         enum reg_class cl = reg_class_for_constraint (cn);
> > >         if (cl != NO_REGS
> > >             && !targetm.class_likely_spilled_p (cl))
> > >           goto fail
> > >
> > >        ...
> > >
> > > I cooked one patch attached to make ira respect this kind of matching
> > > constraint guarded with one parameter.  As I stated in the PR, I was
> > > not sure this is on the right track.  The RFC patch is to check the
> > > matching constraint in all alternatives, if there is one alternative
> > > with matching constraint and matches the current preferred regclass
> > > (or best of allocno?), it will record the output operand number and
> > > further create one constraint copy for it.  Normally it can get the
> > > priority against shuffle copies and the matching constraint will get
> > > satisfied with higher possibility, reload doesn't create extra copies
> > > to meet the matching constraint or the desirable register class when
> > > it has to.
> > >
> > > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
> > > as shuffle copies, and later any of A,B,C,D gets assigned by one
> > > hardware register which is a VSX register (VSX_REG) but not a FP
> > > register (FLOAT_REG), which means it has to pay costs once we can NOT
> > > go with VSX alternatives, so at that time it's important to respect
> > > the matching constraint then we can increase the freq for the remaining
> > > copies related to this (A/B, A/C, A/D).  This idea requires some side
> > > tables to record some information and seems a bit complicated in the
> > > current framework, so the proposed patch aggressively emphasizes the
> > > matching constraint at the time of creating copies.
> > >
> >
> > Comparing with the original patch (v1), this patch v3 has
> > considered: (this should be v2 for this mail list, but bump
> > it to be consistent as PR's).
> >
> >   - Excluding the case where for one preferred register class
> >     there can be two or more alternatives, one of them has the
> >     matching constraint, while another doesn't have.  So for
> >     the given operand, even if it's assigned by a hardware reg
> >     which doesn't meet the matching constraint, it can simply
> >     use the alternative which doesn't have matching constraint
> >     so no register move is needed.  One typical case is
> >     define_insn *mov<mode>_internal2 on rs6000.  So we
> >     shouldn't create constraint copy for it.
> >
> >   - The possible free register move in the same register class,
> >     disable this if so since the register move to meet the
> >     constraint is considered as free.
> >
> >   - Making it on by default, suggested by Segher & Vladimir, we
> >     hope to get rid of the parameter if the benchmarking result
> >     looks good on major targets.
> >
> >   - Tweaking cost when either of matching constraint two sides
> >     is hardware register.  Before this patch, the constraint
> >     copy is simply taken as a real move insn for pref and
> >     conflict cost with one hardware register, after this patch,
> >     it's allowed that there are several input operands
> >     respecting the same matching constraint (but in different
> >     alternatives), so we should take it to be like shuffle copy
> >     for some cases to avoid over preferring/disparaging.
> >
> > Please check the PR comments for more details.
> >
> > This patch can be bootstrapped & regtested on
> > powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some
> > "XFAIL->XPASS" failures on aarch64-linux-gnu.  The failure list
> > was attached in the PR and thought the new assembly looks
> > improved (expected).
> >
> > With option Ofast unroll, this patch can help to improve SPEC2017
> > bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
> > 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
> > remarkable degradations.
> >
> > Since this patch likely benefits x86_64 and aarch64, but I don't
> > have performance machines with these arches at hand, could
> > someone kindly help to benchmark it if possible?
> I can help test it on Intel cascade lake and AMD milan.
And could you rebase your patch on the lastest trunk, i got several
failures when applying the patch
~ git apply ira-v3.diff
error: patch failed: gcc/doc/invoke.texi:13845
error: gcc/doc/invoke.texi: patch does not apply
error: patch failed: gcc/ira-conflicts.c:233
error: gcc/ira-conflicts.c: patch does not apply
error: patch failed: gcc/ira-int.h:971
error: gcc/ira-int.h: patch does not apply
error: patch failed: gcc/ira.c:1922
error: gcc/ira.c: patch does not apply
error: patch failed: gcc/params.opt:330
error: gcc/params.opt: patch does not apply

> >
> > Many thanks in advance!
> >
> > btw, you can simply ignore the part about parameter
> > ira-consider-dup-in-all-alts (its name/description), it's sort of
> > stale, I let it be for now as we will likely get rid of it.
> >
> > BR,
> > Kewen
> > -----
> > gcc/ChangeLog:
> >
> >         * doc/invoke.texi (ira-consider-dup-in-all-alts): Document new
> >         parameter.
> >         * ira.c (ira_get_dup_out_num): Adjust as parameter
> >         param_ira_consider_dup_in_all_alts.
> >         * params.opt (ira-consider-dup-in-all-alts): New.
> >         * ira-conflicts.c (process_regs_for_copy): Add one parameter
> >         single_input_op_has_cstr_p.
> >         (get_freq_for_shuffle_copy): New function.
> >         (add_insn_allocno_copies): Adjust as single_input_op_has_cstr_p.
> >         * ira-int.h (ira_get_dup_out_num): Add one bool parameter.
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


More information about the Gcc-patches mailing list