[PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]
Richard Biener
richard.guenther@gmail.com
Wed Sep 9 07:30:29 GMT 2020
On Wed, Sep 9, 2020 at 3:47 AM luoxhu <luoxhu@linux.ibm.com> wrote:
>
>
>
> On 2020/9/8 16:26, Richard Biener wrote:
> >> Seems not only pseudo, for example "v = vec_insert (i, v, n);"
> >> the vector variable will be store to stack first, then [r112:DI] is a
> >> memory here to be processed. So the patch loads it from stack(insn #10) to
> >> temp vector register first, and store to stack again(insn #24) after
> >> rs6000_vector_set_var.
> > Hmm, yeah - I guess that's what should be addressed first then.
> > I'm quite sure that in case 'v' is not on the stack but in memory like
> > in my case a SImode store is better than what we get from
> > vec_insert - in fact vec_insert will likely introduce a RMW cycle
> > which is prone to inserting store-data-races?
>
> Yes, for your case, there is no stack operation and to_rtx is expanded
> with BLKmode instead of V4SImode. Add the to_rtx mode check could workaround
> it. ASM doesn't show store hit load issue.
>
> optimized:
>
> _1 = i_2(D) % 4;
> VIEW_CONVERT_EXPR<int[4]>(x.u)[_1] = a_4(D);
>
> expand:
> 2: r118:DI=%3:DI
> 3: r119:DI=%4:DI
> 4: NOTE_INSN_FUNCTION_BEG
> 7: r120:DI=unspec[`*.LANCHOR0',%2:DI] 47
> REG_EQUAL `*.LANCHOR0'
> 8: r122:SI=r118:DI#0
> 9: {r124:SI=r122:SI/0x4;clobber ca:SI;}
> 10: r125:SI=r124:SI<<0x2
> 11: r123:SI=r122:SI-r125:SI
> REG_EQUAL r122:SI%0x4
> 12: r126:DI=sign_extend(r123:SI)
> 13: r127:DI=r126:DI+0x4
> 14: r128:DI=r127:DI<<0x2
> 15: r129:DI=r120:DI+r128:DI
> 16: [r129:DI]=r119:DI#0
>
> p to_rtx
> $319 = (rtx_def *) (mem/c:BLK (reg/f:DI 120) [2 x+0 S32 A128])
>
> asm:
> addis 2,12,.TOC.-.LCF0@ha
> addi 2,2,.TOC.-.LCF0@l
> .localentry test,.-test
> srawi 9,3,2
> addze 9,9
> addis 10,2,.LANCHOR0@toc@ha
> addi 10,10,.LANCHOR0@toc@l
> slwi 9,9,2
> subf 9,9,3
> extsw 9,9
> addi 9,9,4
> sldi 9,9,2
> stwx 4,10,9
> blr
>
>
> >
> > So - what we need to "fix" is cfgexpand.c marking variably-indexed
> > decls as not to be expanded as registers (see
> > discover_nonconstant_array_refs).
> >
> > I guess one way forward would be to perform instruction
> > selection on GIMPLE here and transform
> >
> > VIEW_CONVERT_EXPR<int[4]>(D.3185)[_1] = i_6(D)
> >
> > to a (direct) internal function based on the vec_set optab.
>
> I don't quite understand what you mean here. Do you mean:
> ALTIVEC_BUILTIN_VEC_INSERT -> VIEW_CONVERT_EXPR -> internal function -> vec_set
You're writing VIEW_CONVERT_EXPR here but the outermost component
is an ARRAY_REF. But yes, this is what I meant.
> or ALTIVEC_BUILTIN_VEC_INSERT -> internal function -> vec_set?
> And which pass to put the selection and transform is acceptable?
Close to RTL expansion. There's gimple-isel.cc which does instruction selection
for VEC_COND_EXPRs.
> Why call it *based on* vec_set optab? The VIEW_CONVERT_EXPR or internal function
> is expanded to vec_set optab.
Based on because we have the convenient capability to represent optabs to be
used for RTL expansion as internal function calls on GIMPLE, called
"direct internal function".
> I guess you suggest adding internal function for VIEW_CONVERT_EXPR in gimple,
> and do the transform from internal function to vec_set optab in expander?
No, I suggest to "add" an internal function for the vec_set optab, see
DEF_INTERNAL_OPTAB_FN in internal-fn.def
> I doubt my understanding as this looks really over complicated since we
> transform from VIEW_CONVERT_EXPR to vec_set optab directly so far...
> IIUC, Internal function seems doesn't help much here as Segher said before.
The advantage would be to circumvent GIMPLEs forcing of memory here.
But as I said here:
> > But then in GIMPLE D.3185 is also still memory (we don't have a variable
> > index partial register set operation - BIT_INSERT_EXPR is
> > currently specified to receive a constant bit position only).
it might not work out so easy. Going down the rathole to avoid forcing
memory during RTL expansion for select cases (vector type bases
with a supported vector mode) might be something to try.
That at least would make the approach of dealing with this
in expand_assignment or siblings sensible.
> > At which point after your patch is the stack storage elided?
> >
>
> Stack storage is elided by register reload pass in RTL.
>
>
> Thanks,
> Xionghu
More information about the Gcc-patches
mailing list