[PATCH, i386]: Add SSE4.2 support - pcmpstr part
H. J. Lu
hjl@lucon.org
Sat Jun 2 00:41:00 GMT 2007
On Sat, Jun 02, 2007 at 01:03:48AM +0200, Uros Bizjak wrote:
> H. J. Lu wrote:
>
> >>I meant to say "I don't think you can reverse a CC mode set by
> >>pcmp[ei]str[im] instructions."
> >>
> Sure you can, and you must leave gcc a way to reverse a CC mode.
> Consider this example:
>
> --cut here--
> void test()
> {
> if (_mm_cmpistrc (src1, src2, IMM_VAL1)
> || _mm_cmpistro (src1, src2, IMM_VAL1))
> fff1();
> else
> fff2();
> }
> --cut here--
>
> test:
> movdqa src1, %xmm1
> movdqa src2, %xmm0
> pcmpistri $35, %xmm0, %xmm1
> jc .L2
> jno .L3
> .L2:
> jmp fff1
> .p2align 4,,7
> .L3:
> .p2align 4,,8
> jmp fff2
>
> With attached patch, overflow flag check is reversed without problems.
>
> >With my implementation, gcc can optimize
> >
> > res = _mm_cmpistri (src1.x[i], src2.x[i], IMM_VAL1);
> > cf = _mm_cmpistrc (src1.x[i], src2.x[i], IMM_VAL1);
> > zf = _mm_cmpistrz (src1.x[i], src2.x[i], IMM_VAL1);
> > sf = _mm_cmpistrs (src1.x[i], src2.x[i], IMM_VAL1);
> > of = _mm_cmpistro (src1.x[i], src2.x[i], IMM_VAL1);
> > af = _mm_cmpistra (src1.x[i], src2.x[i], IMM_VAL1);
> >
>
> Yes. Please check attached patch that achieves this with generic gcc
> machinery.
>
> Attached patch introduces four new CCmodes that are used for
> (nonstandard) flags bit tests. We only need two RTX codes (EQ and NE),
> so we don't need any hacking inside predicates for comparisons. Also,
> these are easily reversed as confirmed by the testcase above.
>
> The CSE functionality is achieved by expanding the pattern into
> three-way RTvec, where the pattern is splitted just before register
> allocation into simpler patterns: intreg+CC, mmxreg+CC, CConly or
> two-insn "pcm?stri"/"pcm?strm" sequence. So this indeed eliminates
> redundant instructions even in the most insane cases (if a mask setting
> insn is added in the midle of the above example). When CConly insn is
> present, reload is free to choose between two instructions according to
> allocated scratch register.
>
> To achieve all this functionality, we need to introduce new register
> class, where the only member is %xmm0. This is needed for reload to
> select most appropriate CConly instruction, according to allocated
> (free) scratch register. Surprisingly, register allocator and reload
> handle all these mega-instructions without any problems, always
> allocating correct (requested) registers.
>
> H.J. - I don't have SSE4.2 testcases here, could you spare a few cycles
> and check this patch with the SSE4.2 testsuite? The patch was
> bootstrapped on x86_64-pc-linux-gnu and regtested with all default
> languages. -m32 testsuite is in progress, and it will finish during the
> night.
>
I checked this version on both Linux/ia32 and linux/Intel64. It passes
all SSE4.2 runtime tests. I also compared the assembly outputs against
my original version. They are equivalent.
It looks great to me. Thanks.
H.J.
More information about the Gcc-patches
mailing list