This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

VIS2 pattern review

[ Using the UltraSparc Architecture, Draft D0.9.4, 27 Sep 2010.
  I believe this is the most recent public manual.  It covers
  VIS1 and VIS2 but not VIS3. ]

The comment for fpmerge_vis is not correct.
I believe that the operation is representable with

      (match_operand:V4QI 1 ...)
      (match_operand:V4QI 2 ...)
    (parallel [
	0 4 1 5 2 6 3 7

which can be used as the basis for both of the


named patterns.


> (define_insn "fmul8x16_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V4HI 2 "register_operand" "e")))]

This is invalid rtl.  You need

      (match_operand:V4QI 1 ...))
    (match_operand:V4HI 2 ...))

> (define_insn "fmul8x16au_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V2HI 2 "register_operand" "f")))]

AFAICS, this needs an unspec, like fmul8x16al.
Similarly for fmul8sux16_vis, fmuld8sux16_vis,

There's a code sample 7-1 that illustrates a 16x16 multiply:

	fmul8sux16 %f0, %f1, %f2
	fmul8ulx16 %f0, %f1, %f3
	fpadd16    %f2, %f3, %f4

This expansion ought to be available via the "mulv4hi3" named pattern.

Similarly there's a 16x16 -> 32 multiply example:

	fmuld8sux16 %f0, %f1, %f2
	fmuld8ulx16 %f0, %f1, %f3
	fpadd32     %f2, %f3, %f4

that ought to be available via the "vec_widen_smult_{hi,lo}_v4hi"
named patterns.


The "movmisalign<mode>" named pattern ought be provided, utilizing the
alignaddr / faligndata insns.


The "vec_perm{,_const}v8qi" named patterns ought to be provided using
the bmask / bshuffle insns.

For vec_perm_constv8qi, the compaction of the input byte data to nibble
data, as input to bmask, can happen at compile-time.  For vec_permv8qi,
you'll need to do this at runtime:

Considering each character as a nibble (x = garbage, . = zero):

	i = input 			= xaxbxcxdxexfxgxh
	t1 = i  & 0x0f0f0f0f0f0f0f0f	= .a.b.c.d.e.f.g.h
	t2 = t1 >> 4			= ..a.b.c.d.e.f.g.
	t3 = t1 + t2			= ..abbccddeeffggh
	t4 = t3 & 0x00ff00ff00ff00ff    =
	t5 = t4 >> 8			=
	t6 = t4 + t5			= ..ababcdcdefefgh
	t7 = t6 & 0x0000ffff0000ffff	= ....abcd....efgh
	t8 = t7 >> 16			= ........abcd....
	t9 = t7 + t8			= ........abcdefgh

where that last addition can be performed by the bmask itself.

Dunno if you can come up with a more efficient sequence.  Indeed,
you may want two totally separate sequences depending on whether
the original input is in fp (vector) or integer registers.  Which
of course means delaying the expansion until reload.


The comment above cmask8<>_vis suggests an implementation of
the named "vcond<><>" patterns.


> (define_insn "fpadd64_vis"
>   [(set (match_operand:DI 0 "register_operand" "=e")
>         (plus:DI (match_operand:DI 1 "register_operand" "e")
>                  (match_operand:DI 2 "register_operand" "e")))]
>   "fpadd64\t%1, %2, %0")

This must be folded into the main "adddi3" pattern, like fpadd32s.
It's not recognizable otherwise.  Similarly fpsub64.  If these
patterns were earlier in the file you'd have noticed them breaking
the build.


> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
> (define_code_attr vis3_addsub_ss_insn
>   [(ss_plus "fpadds") (ss_minus "fpsubs")])
> (define_insn "<vis3_addsub_ss_insn><vbits>_vis"
>   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
>         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" "<vconstr>")
>                              (match_operand:VASS 2 "register_operand" "<vconstr>")))]
>   "<vis3_addsub_ss_insn><vbits>\t%1, %2, %0")

These should be exposed as "ssadd<mode>3" "sssub<mode>3".

Unfortunately, the compiler won't do anything with them yet,
but those are the canonical names for signed saturating add/sub,
and if you use those names we'll automatically use them properly
once the vectorizer is extended in the right directions.


Other missing vectorization patterns:


The first three should be provided any time any vector operation
is supported, if at all possible.  Otherwise the compiler will
wind up dropping the data to memory to manipulate it.

The even/odd can be implemented with bshuffle.  We probably ought
to handle this in the middle-end by falling back to vec_perm*, 
but we currently don't.  PPC and SPU could be simplified with this.

The vec_pack_trunc pattern is essentially the same as even/odd,
with the right one selected by endianness.  That said, we still
don't fall back to another pattern.

The other patterns, I don't believe could be helped by the middle-end.
At least not yet.  I seem to recall we've been talking about some
generic representation of vector comparisons, which could be used to
aid middle-end expansions of vec_unpack via compares, zeros, and

I don't know how much VIS3 provides that could create specialized
versions of any of these.

Happy hacking,


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]