AltiVec support in GCC

Geoff Keating geoffk@cygnus.com
Sat Feb 19 15:03:00 GMT 2000


Mike Stump <mrs@windriver.com> writes:

> > Date: Fri, 18 Feb 2000 18:04:50 -0800
> > From: Stan Shebs <shebs@apple.com>
> > To: Mike Stump <mrs@windriver.com>
> 
> > Since this is an extension to the dialect of C accepted by GCC, I
> > would hope there would be some effort to define the dialect before
> > putting in dueling patches. :-)
> 
> Well, people might say, this is how this vendor defined it, and we
> want gcc to be compatible with it.  Fairly reasonable, though long
> term, would be nice to do better than this, if we can.

My concern about this is that it would be very powerpc-specific.
There are similar but different language extensions for Sparc VIS, for
MMX (and for all I know even more for all the non-Intel MMX extensions).

So we would end up having a different frontend for ppc than for all
others.  Code would not be portable between ppc and other
architectures.  There would be a ppc-specific set of frontend bugs.
We would probably need to have a ppc-specific C++ frontend too.


In the end, I suspect that the right thing to do is to implement all
this properly: to do a proper loop-vectorisation pass.  Such a thing
need not be very complicated, because to be useful it only has to
recognise _one_ idiom for the affected operation; then you get
portable code that only runs fast on one architecture, but if this is
a problem you can always improve the loop vectoriser.

For instance, there is nothing especially difficult about recognising
the following code:

unsigned char a[1024];
unsigned char b[1024];
unsigned char c[2048];
int i;

for (i = 0; i < sizeof(c); i++)
{
  int t1, t2;

  t1 = a[i] + b[i];
  if (t1 > 255) t1 = 255;

  t2 = a[i] - b[i];
  if (t2 < 0) t2 = 0;
  
  c[2*i] = t1;
  c[2*i+1] = t2;
}

as the following vector instructions in a loop:

	lvx	v0,r3,r4
	lvx	v1,r3,r5
	vaddubs	v2,v0,v1
	vsububs	v3,v0,v1
	vmrghb  v4,v2,v3
	vmrglb	v5,v2,v3
	stvx	v4,r3,r6
	stvx	v5,r3,r7

and even if it turned out to be difficult to get best performance with
a long loop, it would be even easier to recognise loops the size of
the vector registers:

unsigned char a[16];
unsigned char b[16];
unsigned char c[16];

for (i = 0; i < 16; i++)
{
  int t;
  t = a[i] + b[i];
  if (t > 255) t = 255;
  c[i] = t;
}

as the equivalent 'vaddubs' instruction, with a, b, and c held in
vector registers.  You would only need to recognise one such loop for
each instruction; the user would generate such instructions by the use
of a suitable header file (or C++ class).

-- 
- Geoffrey Keating <geoffk@cygnus.com>


More information about the Gcc mailing list