This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Vector permutation support for x86


On 12/03/2009 08:27 PM, Sebastian Pop wrote:
On Wed, Dec 2, 2009 at 16:23, Richard Henderson <rth@redhat.com> wrote:
@@ -1724,8 +1723,8 @@
         (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
        (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
  "TARGET_FMA4
- && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
- && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+ && MEM_P (operands[2])
+ && (MEM_P (operands[1]) || MEM_P (operands[3]))
&& !reg_mentioned_p (operands[0], operands[1])
&& !reg_mentioned_p (operands[0], operands[2])
&& !reg_mentioned_p (operands[0], operands[3])"

This is the splitter under fma4_fmadd<mode>4256", but the same comment applies to all of the fma4 splitters.

I think that all these splitters are not needed. ATM, we regress in i386 testsuite:


FAIL: gcc.target/i386/funcspec-9.c scan-assembler vfmaddss

due to failure to generate vfmaddss for -m32 for this testcase:

float
flt_mul_add (float a, float b, float c)
{
  return (a * b) + c;
}

combine pass first creates separate multiply and separate add instructions, both with a memory operand:

Successfully matched this instruction:
(set (reg:SF 64)
    (plus:SF (reg:SF 65)
        (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                (const_int 8 [0x8])) [2 c+0 S4 A32])))

Successfully matched this instruction:
(set (reg:SF 64)
    (plus:SF (reg:SF 65)
        (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                (const_int 8 [0x8])) [2 c+0 S4 A32])))
deferring deletion of insn with uid = 4.

and then fails to match:

Failed to match this instruction:
(set (reg:SF 64)
    (plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
            (reg/v:SF 62 [ b ]))
        (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                (const_int 8 [0x8])) [2 c+0 S4 A32])))

(BTW: Pushing all values into registers at expand time is simply not effective for FMAs.)

The generation of FMA insn can be fixed by having only following pattern, without any splitter support:

(define_insn "fma4_fmadd<mode>4"
  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
    (plus:SSEMODEF4
     (mult:SSEMODEF4
      (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
      (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
     (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
  "TARGET_FMA4"
  "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
  [(set_attr "type" "ssemuladd")
   (set_attr "mode" "<MODE>")])

Please note, that now all operands can consume memory operand as well, so combine creates desired insn pattern:

Successfully matched this instruction:
(set (reg:SF 64)
    (plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
            (reg/v:SF 62 [ b ]))
        (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                (const_int 8 [0x8])) [2 c+0 S4 A32])))

We pass this insn to reload. AFAIK, operand constraint ("%x,x" for op1) should be tighter that operand predicate ("nonimmediate_operand" for op1), otherwise reload crashes. Knowing that, we see that all constraints are tighter then the predicates in our case, so even when we have multiple memory operands, gcc generates for 32bit target:

flt_mul_add:
    pushl    %ebp
    movl    %esp, %ebp
    subl    $4, %esp
    vmovss    8(%ebp), %xmm0
    vmovss    12(%ebp), %xmm1
    vfmaddss    16(%ebp), %xmm1, %xmm0, %xmm0
    vmovss    %xmm0, -4(%ebp)
    flds    -4(%ebp)
    leave
    ret

q.e.d.

Attached patch was tested in gcc.target/i386 directory with and without -m32 and fixes mentioned regression for -m32. Despite only light testing, I'm pretty confident that we should change handling of all FMA patterns in the same way as shown in the example in the attached patch.

BTW: Please test patches that apply to 32 and 64bit targets with:

RUNTESTFLAGS="--target_board=unix\{,-m32\}"

Uros.

Attachment: p.diff.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]