Vector permutation support for x86

Sat Dec 5 17:07:00 GMT 2009

On 12/03/2009 08:27 PM, Sebastian Pop wrote:
> On Wed, Dec 2, 2009 at 16:23, Richard Henderson <rth@redhat.com> wrote:
>>> @@ -1724,8 +1723,8 @@
>>>          (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
>>>         (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
>>>   "TARGET_FMA4
>>> - && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
>>> - && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
>>> + && MEM_P (operands[2])
>>> + && (MEM_P (operands[1]) || MEM_P (operands[3]))
>>> && !reg_mentioned_p (operands[0], operands[1])
>>> && !reg_mentioned_p (operands[0], operands[2])
>>> && !reg_mentioned_p (operands[0], operands[3])"
>>
>> This is the splitter under fma4_fmadd<mode>4256", but the same comment
>> applies to all of the fma4 splitters.

I think that all these splitters are not needed. ATM, we regress in i386 
testsuite:

FAIL: gcc.target/i386/funcspec-9.c scan-assembler vfmaddss

due to failure to generate vfmaddss for -m32 for this testcase:

float
flt_mul_add (float a, float b, float c)
{
   return (a * b) + c;
}

combine pass first creates separate multiply and separate add 
instructions, both with a memory operand:

Successfully matched this instruction:
(set (reg:SF 64)
     (plus:SF (reg:SF 65)
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))

Successfully matched this instruction:
(set (reg:SF 64)
     (plus:SF (reg:SF 65)
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))
deferring deletion of insn with uid = 4.

and then fails to match:

Failed to match this instruction:
(set (reg:SF 64)
     (plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
             (reg/v:SF 62 [ b ]))
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))

(BTW: Pushing all values into registers at expand time is simply not 
effective for FMAs.)

The generation of FMA insn can be fixed by having only following 
pattern, without any splitter support:

(define_insn "fma4_fmadd<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
     (plus:SSEMODEF4
      (mult:SSEMODEF4
       (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
       (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
      (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
   "TARGET_FMA4"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])

Please note, that now all operands can consume memory operand as well, 
so combine creates desired insn pattern:

Successfully matched this instruction:
(set (reg:SF 64)
     (plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
             (reg/v:SF 62 [ b ]))
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))

We pass this insn to reload. AFAIK, operand constraint ("%x,x" for op1) 
should be tighter that operand predicate ("nonimmediate_operand" for 
op1), otherwise reload crashes. Knowing that, we see that all 
constraints are tighter then the predicates in our case, so even when we 
have multiple memory operands, gcc generates for 32bit target:

flt_mul_add:
     pushl    %ebp
     movl    %esp, %ebp
     subl    $4, %esp
     vmovss    8(%ebp), %xmm0
     vmovss    12(%ebp), %xmm1
     vfmaddss    16(%ebp), %xmm1, %xmm0, %xmm0
     vmovss    %xmm0, -4(%ebp)
     flds    -4(%ebp)
     leave
     ret

q.e.d.

Attached patch was tested in gcc.target/i386 directory with and without 
-m32 and fixes mentioned regression for -m32. Despite only light 
testing, I'm pretty confident that we should change handling of all FMA 
patterns in the same way as shown in the example in the attached patch.

BTW: Please test patches that apply to 32 and 64bit targets with:

RUNTESTFLAGS="--target_board=unix\{,-m32\}"

Uros.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p.diff.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20091205/c9c1336c/attachment.txt>