This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] (4.1 stage 2 projects): vectorize reduction, part 3/n


On Mon, Jun 20, 2005 at 05:45:48PM +0300, Dorit Naishlos wrote:
> > In case Aldy is paying attention, a cast to TImode is a better
> > representation.  See sse2_lshrti3.
> >
> 
> that implies also representing the other altivec vector shifts in the same
> way (vsldoi, vsr, vsro, vsl, vslo). I think this altivec-vector-shift
> cleanup can go in a separate patch?

Absolutely.  I wasn't even suggesting that you do the work.  Though if
you want to you'd be welcome.

> so on i686 we get the following failures with this patch:
> FAIL: gcc.dg/vect/vect-reduc-1.c scan-tree-dump-times vectorized 3 loops 1
> FAIL: gcc.dg/vect/vect-reduc-1short.c scan-tree-dump-times vectorized 3
> loops 1
> FAIL: gcc.dg/vect/vect-reduc-2.c scan-tree-dump-times vectorized 3 loops 1
> FAIL: gcc.dg/vect/vect-reduc-2char.c scan-tree-dump-times vectorized 3
> loops 1
> 
> (due to lack of vectorization of maximum/minumum for some data types, as
> explained above). Shall I xfail these tests for i686 and x86?

Please.

I'm unsure of the best way to approach this problem.  We don't have 
instructions for these operation.  I could unpack the data and do the
min/max in the next wider vector, but then we'd have unpack and pack
sequences within each loop.

Ideally we'd transform this to 

	v1 = unpackl(initial)
	v2 = unpackh(initial)
	for (i = 0; i < n; i += 16)
	  {
	    v3 = data[i];
	    v4 = unpackl(v3)
	    v5 = unpackh(v3)
	    v6 = umax(v1, v4)
	    v7 = umax(v2, v5)
	  }
	v8 = pack(v6,v7)

rather than having all of the statements inside the loop.

Patch is ok.


r~


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]