This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Add SSE4.2 support - pcmpstr part


H. J. Lu wrote:

You may need to add 2 new SSE register classs, one for xmm0 and one
for other xmmN, like Y0 and Yn I added in my original SSE4.1 patch:

http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01167.html

Otherwise, you will run into

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32189
No, one class is enough. The failure is clearly a register allocator failure (BTW: I have reverted the patch, so we are back to hard regs for variable blends; but safe_vector_operand() was still needed, so we won't break for const0_rtx here).


Regarding the allocator failure:


This is how the pattern _should_ look:

(define_insn "sse4_1_blendvpd"
 [(set (match_operand:V2DF 0 "register_operand" "=x")
       (unspec:V2DF [(match_operand:V2DF 1 "register_operand"  "0")
                     (match_operand:V2DF 2 "nonimmediate_operand" "xm")
                     (match_operand:V2DF 3 "register_operand" "z")]
                    UNSPEC_BLENDV))]
 "TARGET_SSE4_1"
 "blendvpd\t{%3, %2, %0|%0, %2, %3}"
 [(set_attr "type" "ssemov")
  (set_attr "prefix_extra" "1")
  (set_attr "mode" "V2DF")])

So, it is evident that we don't need two classes, as we can't match op3, neither with op2 nor op1. However, RA fails to resolve this situation, as it assigns %xmm0 (first free class "x" register) to op0. It doesn't bother to check the most constrained class - "z", which will fail allocation due to this.

A kind of fixup for this situation would be fairly trivial. Look into constraints and let the most constrained class get its hard register, as the compilation lives and dies by this. There is plenty of "x" class registers to allocate from for other operands. (BTW: This is the same situation as the infamous ICE when other mega-instructions (string insns like "rep stos" & co.) can't get AREG class (%eax) register allocated for their input operand).

BTW: I have tried to use clobbers of %xmm0, but no effect.

I'm kind of disappointed that reload can't solve this situation despite its complexity and inherent magic. However, it can allocate all regs when insn pattern is described using hard reg:

(define_insn "sse4_1_blendvpd"
 [(set (match_operand:V2DF 0 "register_operand" "=x")
       (unspec:V2DF [(match_operand:V2DF 1 "register_operand"  "0")
                     (match_operand:V2DF 2 "nonimmediate_operand" "xm")
                     (reg:V2DF XMM0_REG)]
                    UNSPEC_BLENDV))]
 "TARGET_SSE4_1"
 "blendvpd\t{%%xmm0, %2, %0|%0, %2, %%xmm0}"
 [(set_attr "type" "ssemov")
  (set_attr "prefix_extra" "1")
  (set_attr "mode" "V2DF")])

This just isn't what the pattern should look like. *sigh*

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]