This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: AltiVec support in GCC
- To: Mike Stump <mrs at windriver dot com>
- Subject: Re: AltiVec support in GCC
- From: Geoff Keating <geoffk at cygnus dot com>
- Date: 19 Feb 2000 15:02:32 -0800
- CC: gcc at gcc dot gnu dot org
- References: <200002190247.SAA18031@kankakee.wrs.com>
Mike Stump <mrs@windriver.com> writes:
> > Date: Fri, 18 Feb 2000 18:04:50 -0800
> > From: Stan Shebs <shebs@apple.com>
> > To: Mike Stump <mrs@windriver.com>
>
> > Since this is an extension to the dialect of C accepted by GCC, I
> > would hope there would be some effort to define the dialect before
> > putting in dueling patches. :-)
>
> Well, people might say, this is how this vendor defined it, and we
> want gcc to be compatible with it. Fairly reasonable, though long
> term, would be nice to do better than this, if we can.
My concern about this is that it would be very powerpc-specific.
There are similar but different language extensions for Sparc VIS, for
MMX (and for all I know even more for all the non-Intel MMX extensions).
So we would end up having a different frontend for ppc than for all
others. Code would not be portable between ppc and other
architectures. There would be a ppc-specific set of frontend bugs.
We would probably need to have a ppc-specific C++ frontend too.
In the end, I suspect that the right thing to do is to implement all
this properly: to do a proper loop-vectorisation pass. Such a thing
need not be very complicated, because to be useful it only has to
recognise _one_ idiom for the affected operation; then you get
portable code that only runs fast on one architecture, but if this is
a problem you can always improve the loop vectoriser.
For instance, there is nothing especially difficult about recognising
the following code:
unsigned char a[1024];
unsigned char b[1024];
unsigned char c[2048];
int i;
for (i = 0; i < sizeof(c); i++)
{
int t1, t2;
t1 = a[i] + b[i];
if (t1 > 255) t1 = 255;
t2 = a[i] - b[i];
if (t2 < 0) t2 = 0;
c[2*i] = t1;
c[2*i+1] = t2;
}
as the following vector instructions in a loop:
lvx v0,r3,r4
lvx v1,r3,r5
vaddubs v2,v0,v1
vsububs v3,v0,v1
vmrghb v4,v2,v3
vmrglb v5,v2,v3
stvx v4,r3,r6
stvx v5,r3,r7
and even if it turned out to be difficult to get best performance with
a long loop, it would be even easier to recognise loops the size of
the vector registers:
unsigned char a[16];
unsigned char b[16];
unsigned char c[16];
for (i = 0; i < 16; i++)
{
int t;
t = a[i] + b[i];
if (t > 255) t = 255;
c[i] = t;
}
as the equivalent 'vaddubs' instruction, with a, b, and c held in
vector registers. You would only need to recognise one such loop for
each instruction; the user would generate such instructions by the use
of a suitable header file (or C++ class).
--
- Geoffrey Keating <geoffk@cygnus.com>