This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, i386]: Add SSE4.2 support - pcmpstr part
H. J. Lu wrote:
You may need to add 2 new SSE register classs, one for xmm0 and one
for other xmmN, like Y0 and Yn I added in my original SSE4.1 patch:
http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01167.html
Otherwise, you will run into
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32189
No, one class is enough. The failure is clearly a register allocator
failure (BTW: I have reverted the patch, so we are back to hard regs for
variable blends; but safe_vector_operand() was still needed, so we won't
break for const0_rtx here).
Regarding the allocator failure:
This is how the pattern _should_ look:
(define_insn "sse4_1_blendvpd"
[(set (match_operand:V2DF 0 "register_operand" "=x")
(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "0")
(match_operand:V2DF 2 "nonimmediate_operand" "xm")
(match_operand:V2DF 3 "register_operand" "z")]
UNSPEC_BLENDV))]
"TARGET_SSE4_1"
"blendvpd\t{%3, %2, %0|%0, %2, %3}"
[(set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2DF")])
So, it is evident that we don't need two classes, as we can't match op3,
neither with op2 nor op1. However, RA fails to resolve this situation,
as it assigns %xmm0 (first free class "x" register) to op0. It doesn't
bother to check the most constrained class - "z", which will fail
allocation due to this.
A kind of fixup for this situation would be fairly trivial. Look into
constraints and let the most constrained class get its hard register, as
the compilation lives and dies by this. There is plenty of "x" class
registers to allocate from for other operands. (BTW: This is the same
situation as the infamous ICE when other mega-instructions (string insns
like "rep stos" & co.) can't get AREG class (%eax) register allocated
for their input operand).
BTW: I have tried to use clobbers of %xmm0, but no effect.
I'm kind of disappointed that reload can't solve this situation despite
its complexity and inherent magic. However, it can allocate all regs
when insn pattern is described using hard reg:
(define_insn "sse4_1_blendvpd"
[(set (match_operand:V2DF 0 "register_operand" "=x")
(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "0")
(match_operand:V2DF 2 "nonimmediate_operand" "xm")
(reg:V2DF XMM0_REG)]
UNSPEC_BLENDV))]
"TARGET_SSE4_1"
"blendvpd\t{%%xmm0, %2, %0|%0, %2, %%xmm0}"
[(set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2DF")])
This just isn't what the pattern should look like. *sigh*
Uros.