[PATCH 0/2][RFC][PR88836][AARCH64] Fix redundant ptest instruction

Kugan Vivekanandarajah kugan.vivekanandarajah@linaro.org
Thu Jun 20 04:55:00 GMT 2019


Hi Richard,

Thanks for your comments.

On Thu, 16 May 2019 at 18:13, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> kugan.vivekanandarajah@linaro.org writes:
> > From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
> >
> > Inorder to fix this PR.
> >  * We need to change the whilelo pattern in backend
> >  * Change RTL CSE such that:
> >    - Add support for VEC_DUPLICATE
> >    - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the
> >      parallel rtx at the end.
> >
> > For example, with patch1, we now have rtl insn as follows:
> >
> > (insn 19 18 20 3 (parallel [
> >             (set (reg:VNx4BI 93 [ next_mask_18 ])
> >                 (unspec:VNx4BI [
> >                         (const_int 0 [0])
> >                         (reg:DI 95 [ _33 ])
> >                     ] UNSPEC_WHILE_LO))
> >             (set (reg:CC 66 cc)
> >                 (compare:CC (unspec:SI [
> >                             (vec_duplicate:VNx4BI (const_int 1 [0x1]))
> >                             (reg:VNx4BI 93 [ next_mask_18 ])
> >                         ] UNSPEC_PTEST_PTRUE)
> >                     (const_int 0 [0])))
> >         ]) 4244 {while_ultdivnx4bi}
> >
> > When cse_insn process the first, it records the CSE set in reg 93.  Then after
> > processing both the instruction in the parallel rtx, we invalidate all
> > expression with reg 93 which means expression in the second instruction is
> > invalidated for CSE. Attached patch relaxes this by invalidating before processing the
> > second.
>
> As far as patch 1 goes: the traditional reason for using clobbers
> to start with is that:
>
> - setting CC places a requirement on what CC must be after that instruction.
>   We then have to rely on REG_UNUSED notes to tell whether that value
>   actually matters or not.
>
>   This was a bigger deal before df though.  It might not matter as much now.
>
> - many passes find it harder to deal with multiple sets rather than
>   single sets, so it's better to keep a single_set unless we know
>   that both results are needed.
>
> It's currently combine's job to create a multiple set in cases
> where both results are useful.  The pattern for that already exists
> (*while_ult<GPI:mode><PRED_ALL:mode>_cc), so if we do go for something
> like patch 1, we should simply expand to that insn rather than adding a
> new one.  Note that:
>
>   (vec_duplicate:PRED_ALL (const_int 1))
>
> shouldn't appear in the insn stream.  It should always be a const_vector
> instead.
>
> From a quick check, I haven't yet found a case where setting CC up-front
> hurts though, so maybe the above is out-of-date best practice and we
> should set the register up-front after all, if only to reduce the number
> of patterns.
>
> However, if possible, I think we should fix the PR in a way that works
> for instructions that only optionally set the flags (which for AArch64
> is the common case).  So it would be good if we could fix the PR without
> needing patch 1.

Do you think that combine should be able to set this. Sorry, I don't
understand how we can let other passes know that this instruction will
set the flags needed.

Thanks,
Kugan

>
> Thanks,
> Richard
>
> >
> > Bootstrap and regression testing for the current version is ongoing.
> >
> > Thanks,
> > Kugan
> >
> > Kugan Vivekanandarajah (2):
> >   [PR88836][aarch64] Set CC_REGNUM instead of clobber
> >   [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one
> >
> >  gcc/config/aarch64/aarch64-sve.md          |  9 +++-
> >  gcc/cse.c                                  | 67 ++++++++++++++++++++++++++----
> >  gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++
> >  3 files changed, 80 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c



More information about the Gcc-patches mailing list