This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Add SSE4.2 support - pcmpstr part


On Sat, Jun 02, 2007 at 01:03:48AM +0200, Uros Bizjak wrote:
> H. J. Lu wrote:
> 
> >>I meant to say "I don't think you can reverse a CC mode set by
> >>pcmp[ei]str[im] instructions."
> >>    
> Sure you can, and you must leave gcc a way to reverse a CC mode. 
> Consider this example:
> 
> --cut here--
> void test()
> {
>  if (_mm_cmpistrc (src1, src2, IMM_VAL1)
>      || _mm_cmpistro (src1, src2, IMM_VAL1))
>    fff1();
>  else
>    fff2();
> }
> --cut here--
> 
> test:
>        movdqa  src1, %xmm1
>        movdqa  src2, %xmm0
>        pcmpistri       $35, %xmm0, %xmm1
>        jc      .L2
>        jno     .L3
> .L2:
>        jmp     fff1
>        .p2align 4,,7
> .L3:
>        .p2align 4,,8
>        jmp     fff2
> 
> With attached patch, overflow flag check is reversed without problems.
> 
> >With my implementation, gcc can optimize
> >
> >	  res = _mm_cmpistri (src1.x[i], src2.x[i], IMM_VAL1);
> >          cf = _mm_cmpistrc (src1.x[i], src2.x[i], IMM_VAL1);
> >          zf = _mm_cmpistrz (src1.x[i], src2.x[i], IMM_VAL1);
> >          sf = _mm_cmpistrs (src1.x[i], src2.x[i], IMM_VAL1);
> >          of = _mm_cmpistro (src1.x[i], src2.x[i], IMM_VAL1);
> >          af = _mm_cmpistra (src1.x[i], src2.x[i], IMM_VAL1);
> >  
> 
> Yes. Please check attached patch that achieves this with generic gcc 
> machinery.
> 
> Attached patch introduces four new CCmodes that are used for 
> (nonstandard) flags bit tests. We only need two RTX codes (EQ and NE), 
> so we don't need any hacking inside predicates for comparisons. Also, 
> these are easily reversed as confirmed by the testcase above.
> 
> The CSE functionality is achieved by expanding the pattern into 
> three-way RTvec, where the pattern is splitted just before register 
> allocation into simpler patterns: intreg+CC, mmxreg+CC, CConly or 
> two-insn "pcm?stri"/"pcm?strm" sequence. So this indeed eliminates 
> redundant instructions even in the most insane cases (if a mask setting 
> insn is added in the midle of the above example). When CConly insn is 
> present, reload is free to choose between two instructions according to 
> allocated scratch register.
> 
> To achieve all this functionality, we need to introduce new register 
> class, where the only member is %xmm0. This is needed for reload to 
> select most appropriate CConly instruction, according to allocated 
> (free) scratch register. Surprisingly, register allocator and reload 
> handle all these mega-instructions without any problems, always 
> allocating correct (requested) registers.
> 
> H.J. - I don't have SSE4.2 testcases here, could you spare a few cycles 
> and check this patch with the SSE4.2 testsuite? The patch was 
> bootstrapped on x86_64-pc-linux-gnu and regtested with all default 
> languages. -m32 testsuite is in progress, and it will finish during the 
> night.
> 

I checked this version on both Linux/ia32 and linux/Intel64. It passes
all SSE4.2 runtime tests. I also compared the assembly outputs against
my original version. They are equivalent.

It looks great to me. Thanks.


H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]