This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[commit, spu] Fix vec_perm pattern (Re: [rs6000, spu] Add vec_perm named pattern)
- From: "Ulrich Weigand" <uweigand at de dot ibm dot com>
- To: rth at redhat dot com (Richard Henderson)
- Cc: dje dot gcc at gmail dot com, gcc-patches at gcc dot gnu dot org (GCC Patches)
- Date: Fri, 21 Oct 2011 03:31:12 +0200 (CEST)
- Subject: [commit, spu] Fix vec_perm pattern (Re: [rs6000, spu] Add vec_perm named pattern)
Richard Henderson wrote:
> The generic support for vector permutation will allow for automatic
> lowering to V*QImode, so all we need to add to support for these targets
> is the single V16QI pattern that represents the base permutation insn.
>
> I'm not touching any of the other ways that the permutation insn
> could be generated. After the generic support is added, I'll leave
> it to the port maintainers to determine what they want to keep. I
> suspect in many cases using the generic __builtin_shuffle plus some
> casting in the target-specific header files would be sufficient,
> eliminating several dozen builtins.
Sorry I didn't get to this earlier, I got side-tracked by a number
of independent regressions on SPU ...
Unfortunately, the semantics of vec_perm do not match 100% those of the
SPU Shuffle Bytes instruction. vec_perm assumes the selector elements
apply modulo 32, but shufb uses values >= 128 for special purposes.
See the ISA:
Value in Register RC
(Expressed in Binary) Result Byte
10xxxxxx 0x00
110xxxxx 0xFF
111xxxxx 0x80
Otherwise The byte of the concatenated register addressed by
the rightmost 5 bits of register RC
To implement the vec_perm semantics fully, we therefore need to reduce the
selector modulo 32 explicitly before using shuf.
Tested on spu-elf, fixes various vshuf test cases.
Committed to mainline.
Bye,
Ulrich
ChangeLog:
* config/spu/spu.md ("vec_permv16qi"): Reduce selector modulo 32
before using the shufb instruction.
Index: gcc/config/spu/spu.md
===================================================================
*** gcc/config/spu/spu.md (revision 180240)
--- gcc/config/spu/spu.md (working copy)
*************** selb\t%0,%4,%0,%3"
*** 4395,4410 ****
"shufb\t%0,%1,%2,%3"
[(set_attr "type" "shuf")])
(define_expand "vec_permv16qi"
! [(set (match_operand:V16QI 0 "spu_reg_operand" "")
(unspec:V16QI
[(match_operand:V16QI 1 "spu_reg_operand" "")
(match_operand:V16QI 2 "spu_reg_operand" "")
! (match_operand:V16QI 3 "spu_reg_operand" "")]
UNSPEC_SHUFB))]
""
{
! operands[3] = gen_lowpart (TImode, operands[3]);
})
(define_insn "nop"
--- 4395,4416 ----
"shufb\t%0,%1,%2,%3"
[(set_attr "type" "shuf")])
+ ; The semantics of vec_permv16qi are nearly identical to those of the SPU
+ ; shufb instruction, except that we need to reduce the selector modulo 32.
(define_expand "vec_permv16qi"
! [(set (match_dup 4) (and:V16QI (match_operand:V16QI 3 "spu_reg_operand" "")
! (match_dup 6)))
! (set (match_operand:V16QI 0 "spu_reg_operand" "")
(unspec:V16QI
[(match_operand:V16QI 1 "spu_reg_operand" "")
(match_operand:V16QI 2 "spu_reg_operand" "")
! (match_dup 5)]
UNSPEC_SHUFB))]
""
{
! operands[4] = gen_reg_rtx (V16QImode);
! operands[5] = gen_lowpart (TImode, operands[4]);
! operands[6] = spu_const (V16QImode, 31);
})
(define_insn "nop"
--
Dr. Ulrich Weigand
GNU Toolchain for Linux on System z and Cell BE
Ulrich.Weigand@de.ibm.com