[PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]
luoxhu
luoxhu@linux.ibm.com
Wed Sep 9 01:47:35 GMT 2020
On 2020/9/8 16:26, Richard Biener wrote:
>> Seems not only pseudo, for example "v = vec_insert (i, v, n);"
>> the vector variable will be store to stack first, then [r112:DI] is a
>> memory here to be processed. So the patch loads it from stack(insn #10) to
>> temp vector register first, and store to stack again(insn #24) after
>> rs6000_vector_set_var.
> Hmm, yeah - I guess that's what should be addressed first then.
> I'm quite sure that in case 'v' is not on the stack but in memory like
> in my case a SImode store is better than what we get from
> vec_insert - in fact vec_insert will likely introduce a RMW cycle
> which is prone to inserting store-data-races?
Yes, for your case, there is no stack operation and to_rtx is expanded
with BLKmode instead of V4SImode. Add the to_rtx mode check could workaround
it. ASM doesn't show store hit load issue.
optimized:
_1 = i_2(D) % 4;
VIEW_CONVERT_EXPR<int[4]>(x.u)[_1] = a_4(D);
expand:
2: r118:DI=%3:DI
3: r119:DI=%4:DI
4: NOTE_INSN_FUNCTION_BEG
7: r120:DI=unspec[`*.LANCHOR0',%2:DI] 47
REG_EQUAL `*.LANCHOR0'
8: r122:SI=r118:DI#0
9: {r124:SI=r122:SI/0x4;clobber ca:SI;}
10: r125:SI=r124:SI<<0x2
11: r123:SI=r122:SI-r125:SI
REG_EQUAL r122:SI%0x4
12: r126:DI=sign_extend(r123:SI)
13: r127:DI=r126:DI+0x4
14: r128:DI=r127:DI<<0x2
15: r129:DI=r120:DI+r128:DI
16: [r129:DI]=r119:DI#0
p to_rtx
$319 = (rtx_def *) (mem/c:BLK (reg/f:DI 120) [2 x+0 S32 A128])
asm:
addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
.localentry test,.-test
srawi 9,3,2
addze 9,9
addis 10,2,.LANCHOR0@toc@ha
addi 10,10,.LANCHOR0@toc@l
slwi 9,9,2
subf 9,9,3
extsw 9,9
addi 9,9,4
sldi 9,9,2
stwx 4,10,9
blr
>
> So - what we need to "fix" is cfgexpand.c marking variably-indexed
> decls as not to be expanded as registers (see
> discover_nonconstant_array_refs).
>
> I guess one way forward would be to perform instruction
> selection on GIMPLE here and transform
>
> VIEW_CONVERT_EXPR<int[4]>(D.3185)[_1] = i_6(D)
>
> to a (direct) internal function based on the vec_set optab.
I don't quite understand what you mean here. Do you mean:
ALTIVEC_BUILTIN_VEC_INSERT -> VIEW_CONVERT_EXPR -> internal function -> vec_set
or ALTIVEC_BUILTIN_VEC_INSERT -> internal function -> vec_set?
And which pass to put the selection and transform is acceptable?
Why call it *based on* vec_set optab? The VIEW_CONVERT_EXPR or internal function
is expanded to vec_set optab.
I guess you suggest adding internal function for VIEW_CONVERT_EXPR in gimple,
and do the transform from internal function to vec_set optab in expander?
I doubt my understanding as this looks really over complicated since we
transform from VIEW_CONVERT_EXPR to vec_set optab directly so far...
IIUC, Internal function seems doesn't help much here as Segher said before.
> But then in GIMPLE D.3185 is also still memory (we don't have a variable
> index partial register set operation - BIT_INSERT_EXPR is
> currently specified to receive a constant bit position only).
>
> At which point after your patch is the stack storage elided?
>
Stack storage is elided by register reload pass in RTL.
Thanks,
Xionghu
More information about the Gcc-patches
mailing list