[PATCH 0/2][RFC][PR88836][AARCH64] Fix redundant ptest instruction
Kugan Vivekanandarajah
kugan.vivekanandarajah@linaro.org
Thu Jun 20 04:55:00 GMT 2019
Hi Richard,
Thanks for your comments.
On Thu, 16 May 2019 at 18:13, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> kugan.vivekanandarajah@linaro.org writes:
> > From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
> >
> > Inorder to fix this PR.
> > * We need to change the whilelo pattern in backend
> > * Change RTL CSE such that:
> > - Add support for VEC_DUPLICATE
> > - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the
> > parallel rtx at the end.
> >
> > For example, with patch1, we now have rtl insn as follows:
> >
> > (insn 19 18 20 3 (parallel [
> > (set (reg:VNx4BI 93 [ next_mask_18 ])
> > (unspec:VNx4BI [
> > (const_int 0 [0])
> > (reg:DI 95 [ _33 ])
> > ] UNSPEC_WHILE_LO))
> > (set (reg:CC 66 cc)
> > (compare:CC (unspec:SI [
> > (vec_duplicate:VNx4BI (const_int 1 [0x1]))
> > (reg:VNx4BI 93 [ next_mask_18 ])
> > ] UNSPEC_PTEST_PTRUE)
> > (const_int 0 [0])))
> > ]) 4244 {while_ultdivnx4bi}
> >
> > When cse_insn process the first, it records the CSE set in reg 93. Then after
> > processing both the instruction in the parallel rtx, we invalidate all
> > expression with reg 93 which means expression in the second instruction is
> > invalidated for CSE. Attached patch relaxes this by invalidating before processing the
> > second.
>
> As far as patch 1 goes: the traditional reason for using clobbers
> to start with is that:
>
> - setting CC places a requirement on what CC must be after that instruction.
> We then have to rely on REG_UNUSED notes to tell whether that value
> actually matters or not.
>
> This was a bigger deal before df though. It might not matter as much now.
>
> - many passes find it harder to deal with multiple sets rather than
> single sets, so it's better to keep a single_set unless we know
> that both results are needed.
>
> It's currently combine's job to create a multiple set in cases
> where both results are useful. The pattern for that already exists
> (*while_ult<GPI:mode><PRED_ALL:mode>_cc), so if we do go for something
> like patch 1, we should simply expand to that insn rather than adding a
> new one. Note that:
>
> (vec_duplicate:PRED_ALL (const_int 1))
>
> shouldn't appear in the insn stream. It should always be a const_vector
> instead.
>
> From a quick check, I haven't yet found a case where setting CC up-front
> hurts though, so maybe the above is out-of-date best practice and we
> should set the register up-front after all, if only to reduce the number
> of patterns.
>
> However, if possible, I think we should fix the PR in a way that works
> for instructions that only optionally set the flags (which for AArch64
> is the common case). So it would be good if we could fix the PR without
> needing patch 1.
Do you think that combine should be able to set this. Sorry, I don't
understand how we can let other passes know that this instruction will
set the flags needed.
Thanks,
Kugan
>
> Thanks,
> Richard
>
> >
> > Bootstrap and regression testing for the current version is ongoing.
> >
> > Thanks,
> > Kugan
> >
> > Kugan Vivekanandarajah (2):
> > [PR88836][aarch64] Set CC_REGNUM instead of clobber
> > [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one
> >
> > gcc/config/aarch64/aarch64-sve.md | 9 +++-
> > gcc/cse.c | 67 ++++++++++++++++++++++++++----
> > gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++
> > 3 files changed, 80 insertions(+), 10 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c
More information about the Gcc-patches
mailing list