Vector permutation support for x86
Uros Bizjak
ubizjak@gmail.com
Sat Dec 5 17:07:00 GMT 2009
On 12/03/2009 08:27 PM, Sebastian Pop wrote:
> On Wed, Dec 2, 2009 at 16:23, Richard Henderson <rth@redhat.com> wrote:
>>> @@ -1724,8 +1723,8 @@
>>> (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
>>> (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
>>> "TARGET_FMA4
>>> - && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
>>> - && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
>>> + && MEM_P (operands[2])
>>> + && (MEM_P (operands[1]) || MEM_P (operands[3]))
>>> && !reg_mentioned_p (operands[0], operands[1])
>>> && !reg_mentioned_p (operands[0], operands[2])
>>> && !reg_mentioned_p (operands[0], operands[3])"
>>
>> This is the splitter under fma4_fmadd<mode>4256", but the same comment
>> applies to all of the fma4 splitters.
I think that all these splitters are not needed. ATM, we regress in i386
testsuite:
FAIL: gcc.target/i386/funcspec-9.c scan-assembler vfmaddss
due to failure to generate vfmaddss for -m32 for this testcase:
float
flt_mul_add (float a, float b, float c)
{
return (a * b) + c;
}
combine pass first creates separate multiply and separate add
instructions, both with a memory operand:
Successfully matched this instruction:
(set (reg:SF 64)
(plus:SF (reg:SF 65)
(mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
(const_int 8 [0x8])) [2 c+0 S4 A32])))
Successfully matched this instruction:
(set (reg:SF 64)
(plus:SF (reg:SF 65)
(mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
(const_int 8 [0x8])) [2 c+0 S4 A32])))
deferring deletion of insn with uid = 4.
and then fails to match:
Failed to match this instruction:
(set (reg:SF 64)
(plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
(reg/v:SF 62 [ b ]))
(mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
(const_int 8 [0x8])) [2 c+0 S4 A32])))
(BTW: Pushing all values into registers at expand time is simply not
effective for FMAs.)
The generation of FMA insn can be fixed by having only following
pattern, without any splitter support:
(define_insn "fma4_fmadd<mode>4"
[(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
(plus:SSEMODEF4
(mult:SSEMODEF4
(match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
(match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
(match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
"TARGET_FMA4"
"vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
[(set_attr "type" "ssemuladd")
(set_attr "mode" "<MODE>")])
Please note, that now all operands can consume memory operand as well,
so combine creates desired insn pattern:
Successfully matched this instruction:
(set (reg:SF 64)
(plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
(reg/v:SF 62 [ b ]))
(mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
(const_int 8 [0x8])) [2 c+0 S4 A32])))
We pass this insn to reload. AFAIK, operand constraint ("%x,x" for op1)
should be tighter that operand predicate ("nonimmediate_operand" for
op1), otherwise reload crashes. Knowing that, we see that all
constraints are tighter then the predicates in our case, so even when we
have multiple memory operands, gcc generates for 32bit target:
flt_mul_add:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
vmovss 8(%ebp), %xmm0
vmovss 12(%ebp), %xmm1
vfmaddss 16(%ebp), %xmm1, %xmm0, %xmm0
vmovss %xmm0, -4(%ebp)
flds -4(%ebp)
leave
ret
q.e.d.
Attached patch was tested in gcc.target/i386 directory with and without
-m32 and fixes mentioned regression for -m32. Despite only light
testing, I'm pretty confident that we should change handling of all FMA
patterns in the same way as shown in the example in the attached patch.
BTW: Please test patches that apply to 32 and 64bit targets with:
RUNTESTFLAGS="--target_board=unix\{,-m32\}"
Uros.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p.diff.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20091205/c9c1336c/attachment.txt>
More information about the Gcc-patches
mailing list