This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [patch] (4.1 stage 2 projects): vectorize reduction, part 3/n

From: Richard Henderson <rth at redhat dot com>
To: Dorit Naishlos <DORIT at il dot ibm dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Mon, 20 Jun 2005 11:14:55 -0700
Subject: Re: [patch] (4.1 stage 2 projects): vectorize reduction, part 3/n
References: <20050619172501.GA20194@redhat.com> <OFA7F7C79E.BEFDF6B2-ONC2257026.002CCCDD-C2257026.005118FC@il.ibm.com>

On Mon, Jun 20, 2005 at 05:45:48PM +0300, Dorit Naishlos wrote:
> > In case Aldy is paying attention, a cast to TImode is a better
> > representation.  See sse2_lshrti3.
> >
> 
> that implies also representing the other altivec vector shifts in the same
> way (vsldoi, vsr, vsro, vsl, vslo). I think this altivec-vector-shift
> cleanup can go in a separate patch?

Absolutely.  I wasn't even suggesting that you do the work.  Though if
you want to you'd be welcome.

> so on i686 we get the following failures with this patch:
> FAIL: gcc.dg/vect/vect-reduc-1.c scan-tree-dump-times vectorized 3 loops 1
> FAIL: gcc.dg/vect/vect-reduc-1short.c scan-tree-dump-times vectorized 3
> loops 1
> FAIL: gcc.dg/vect/vect-reduc-2.c scan-tree-dump-times vectorized 3 loops 1
> FAIL: gcc.dg/vect/vect-reduc-2char.c scan-tree-dump-times vectorized 3
> loops 1
> 
> (due to lack of vectorization of maximum/minumum for some data types, as
> explained above). Shall I xfail these tests for i686 and x86?

Please.

I'm unsure of the best way to approach this problem.  We don't have 
instructions for these operation.  I could unpack the data and do the
min/max in the next wider vector, but then we'd have unpack and pack
sequences within each loop.

Ideally we'd transform this to 

	v1 = unpackl(initial)
	v2 = unpackh(initial)
	for (i = 0; i < n; i += 16)
	  {
	    v3 = data[i];
	    v4 = unpackl(v3)
	    v5 = unpackh(v3)
	    v6 = umax(v1, v4)
	    v7 = umax(v2, v5)
	  }
	v8 = pack(v6,v7)

rather than having all of the statements inside the loop.

Patch is ok.


r~

Follow-Ups:
- Re: [patch] (4.1 stage 2 projects): vectorize reduction, part 3/n
  - From: Paolo Bonzini

References:
- Re: [patch] (4.1 stage 2 projects): vectorize reduction, part 3/n
  - From: Richard Henderson
- Re: [patch] (4.1 stage 2 projects): vectorize reduction, part 3/n
  - From: Dorit Naishlos

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]