This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, rs6000] Fix PR81833 (incorrect code gen for vec_msum)


Hi Bill,

On Mon, Aug 28, 2017 at 03:56:21PM -0500, Bill Schmidt wrote:
> PR81833 identifies a problem with the little-endian vector multiply-sum
> instructions.  The original implementation is quite poor (and I am allowed
> to say that, since it was mine).  This patch fixes the code properly.
> 
> The revised code still uses UNSPECs for these ops, which is not strictly
> necessary, although descriptive rtl for them would be pretty complex.  I've
> put in a FIXME to make note of that for a future cleanup.

There is ss_plus, but that is the saturated sum of two things, not of
five resp. three as we need for vsumsws and vsum2sws.

If you convert to double-width, then add, then clamp, the result will
be correct; but then, very often some of that will be folded away (say,
when combine works on these patterns), and then you need a lot of
different patterns to catch all of this.

So we really need an ss_plus that has more than two arguments.  We could
have an unspec for that as well (instead of the unspecs that work on
vectors, unspecs that work on the elements), that may work.

Or perhaps it will work best as-is: the only way we get these unspecs
is via intrinsics, maybe we should just trust the user.

> 	PR target/81833
> 	* config/rs6000/altivec.md (altivec_vsum2sws): Convert from a
> 	define_insn to a define_expand.
> 	(altivec_vsum2sws_direct): New define_insn.
> 	(altivec_vsumsws): Convert from a define_insn to a define_expand.
> 
> [gcc/testsuite]
> 
> 2017-08-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	PR target/81833
> 	* gcc.target/powerpc/pr81833.c: New file.


> +; FIXME: This can probably be expressed without an UNSPEC.
> +(define_insn "altivec_vsum2sws_direct"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> +	              (match_operand:V4SI 2 "register_operand" "v")]
> +		     UNSPEC_VSUM2SWS))]
>    "TARGET_ALTIVEC"
> +  "vsum2sws %0,%1,%2"
> +  [(set_attr "type" "veccomplex")
> +   (set_attr "length" "4")])

This no longer writes to VSCR, please fix that.  "length" 4 is the
default, just leave it.

vsumsws has both of these correct already.

> --- gcc/testsuite/gcc.target/powerpc/pr81833.c	(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr81833.c	(working copy)
> @@ -0,0 +1,54 @@
> +/* PR81833: This used to fail due to improper implementation of vec_msum.  */
> +
> +/* { dg-do run {target { lp64 } } } */

Does this need lp64?  I don't at first glance see anything that would
need it.  Oh, vec_vsx_ld perhaps?

Okay for trunk and branches with the above fixes.  Thanks!


Segher


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]