[PATCH] x86 V[24]TImode vec_{init,extract} (PR target/80846)
H.J. Lu
hjl.tools@gmail.com
Wed Aug 29 14:35:00 GMT 2018
On Thu, Jul 20, 2017 at 12:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> Richard has asked me recently to look at V[24]TI vector extraction
> and initialization, which he wants to use from the vectorizer.
>
> The following is an attempt to implement that.
>
> On the testcases included in the patch we get usually better or
> significantly better code generated, the exception is f1,
> where the change is:
> - movq %rdi, -32(%rsp)
> - movq %rsi, -24(%rsp)
> - movq %rdx, -16(%rsp)
> - movq %rcx, -8(%rsp)
> - vmovdqa -32(%rsp), %ymm0
> + movq %rdi, -16(%rsp)
> + movq %rsi, -8(%rsp)
> + movq %rdx, -32(%rsp)
> + movq %rcx, -24(%rsp)
> + vmovdqa -32(%rsp), %xmm0
> + vmovdqa -16(%rsp), %xmm1
> + vinserti128 $0x1, %xmm0, %ymm1, %ymm0
> which is something that is hard to handle before RA. If the RA
> would spill it the other way around, perhaps it would be solveable by
> transforming
> vmovdqa -32(%rsp), %xmm1
> vmovdqa -16(%rsp), %xmm0
> vinserti128 $0x01, %xmm0, %ymm1, %ymm0
> into
> vmovdqa -32(%rsp), %ymm0
> using peephole2, but no idea how to force it that way. And f11 also
> has similar problem, that time with 3 extra insns. But if the TImode
> variable is allocated in a %?mm* register, we get better code even in those
> cases.
>
> For V4TImode perhaps we could improve some special cases of vec_initv4ti,
> like broadcast or only one variable otherwise everything constant, but at
> least for the broadcast I'm not really sure what is the optimal sequence.
> vbroadcasti32x4 is only able to broadcast from memory, which is good if the
> TImode input lives in memory, but if it doesn't? __builtin_shuffle right
> now generates vpermq with the indices loaded from memory, but that needs to
> wait for memory load...
>
> Another thing is that we actually don't permit a normal move instruction
> for V4TImode unless AVX512BW, so we used to generate terrible code (spill it
> into memory using GPRs and then load back). Any reason for that?
> I've found:
> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01465.html
>> > > - (V2TI "TARGET_AVX") V1TI
>> > > + (V4TI "TARGET_AVX") (V2TI "TARGET_AVX") V1TI
>> >
>> > Are you sure TARGET_AVX is the correct condition for V4TI?
>> Right! This should be TARGET_AVX512BW (because corresponding shifts
>> belong to AVX-512BW).
> but it isn't at all clear what shifts this is talking about. This is VMOVE,
> which is used just in mov<mode>, mov<mode>_internal and movmisalign<mode>
> patterns, I fail to see what kind of shifts would those produce.
> Those should only produce vmovdqa64, vmovdqu64, vpxord or vpternlogd insns
> with %zmm* operands, those are all AVX512F already.
>
> Anyway, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Maybe it would be nice to also improve bitwise logical operations on
> V2TI/V4TImode - probably just expanders like {and,ior,xor}v[24]ti
> and maybe __builtin_shuffle.
>
> Richard also talked about V2OImode support, but I'm afraid that is going to
> be way too hard, we don't really have OImode support in most places.
>
> 2017-07-20 Jakub Jelinek <jakub@redhat.com>
>
> PR target/80846
> * config/i386/i386.c (ix86_expand_vector_init_general): Handle
> V2TImode and V4TImode.
> (ix86_expand_vector_extract): Likewise.
> * config/i386/sse.md (VMOVE): Enable V4TImode even for just
> TARGET_AVX512F, instead of only for TARGET_AVX512BW.
> (ssescalarmode): Handle V4TImode and V2TImode.
> (VEC_EXTRACT_MODE): Add V4TImode and V2TImode.
> (*vec_extractv2ti, *vec_extractv4ti): New insns.
> (VEXTRACTI128_MODE): New mode iterator.
> (splitter for *vec_extractv?ti first element): New.
> (VEC_INIT_MODE): New mode iterator.
> (vec_init<mode>): Consolidate 3 expanders into one using
> VEC_INIT_MODE mode iterator.
>
> * gcc.target/i386/avx-pr80846.c: New test.
> * gcc.target/i386/avx2-pr80846.c: New test.
> * gcc.target/i386/avx512f-pr80846.c: New test.
>
This caused:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87138
H.J.
More information about the Gcc-patches
mailing list