This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Another AltiVec problem
- From: Daniel Egger <degger at fhm dot edu>
- To: Aldy Hernandez <aldyh at redhat dot com>
- Cc: GCC Developer Mailinglist <gcc at gcc dot gnu dot org>
- Date: 14 Mar 2002 01:31:46 +0100
- Subject: Another AltiVec problem
Hija,
consider:
#include <altivec.h>
int main ()
{
static float storage[4] __attribute__ ((aligned(16)));
vector float one = (vector float) {1.0, 1.0, 1.0, 1.0};
vector float two = (vector float) {2.0, 2.0, 2.0, 2.0};
vector float three = (vector float) {3.0, 3.0, 3.0, 3.0};
vector float result;
result = vec_madd (three, two, one);
vec_st (result, 0, storage);
printf ("%f %f %f %f\n", storage[0], storage[1], storage[2],
storage[3]);
}
given that d = vec_madd (a, b, c) is specified to yield
d = {a0 * b0 + c0, a1 * b1 + c1, a2 * b2 + c2, a3 * b3 + c3}
according to the ALTIVEC PIM on page 102 the actual
result of {5.0, 5.0, 5.0, 5.0} is cleary wrong.
Looking at the assembly:
.file "maddfp.c"
.lcomm storage.0,16,16
.section .rodata
.align 4
.LC0:
.long 1065353216 <= 1.0
.long 1065353216
.long 1065353216
.long 1065353216
.align 4
.LC1:
.long 1073741824 <= 2.0
.long 1073741824
.long 1073741824
.long 1073741824
.align 4
.LC2:
.long 1077936128 <= 3.0
.long 1077936128
.long 1077936128
.long 1077936128
.align 2
.LC3:
.string "%f %f %f %f\n"
.section ".text"
.align 2
.globl main
.type main,@function
main:
lis 9,.LC0@ha
lis 11,.LC1@ha
la 9,.LC0@l(9) <= baseaddress of the 1.0s
stwu 1,-16(1)
lvx 13,0,9
la 11,.LC1@l(11) <= baseaddress of the 2.0s
lis 9,.LC2@ha
mflr 0
la 9,.LC2@l(9) <= baseaddress of the 3.0s
lvx 1,0,11
lvx 0,0,9
lis 10,storage.0@ha
stw 0,20(1)
la 11,storage.0@l(10)
li 9,0
vmaddfp 0,0,1,13 <= result <- madd ({3.0}, {2.0}, {1.0})
stvx 0,9,11
lis 3,.LC3@ha
lfs 1,storage.0@l(10)
la 3,.LC3@l(3)
lfs 4,12(11)
lfs 2,4(11)
lfs 3,8(11)
creqv 6,6,6
bl printf
mr 3,0
lwz 0,20(1)
addi 1,1,16
mtlr 0
blr
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 3.2 20020313 (experimental)"
Looking at the ALTIVEC PEM page 6-72 we can see that the
mnemonic is defined to be:
vmaddfp vD, vA, vC, vB
where vD will be the result of vA * vC + vB
(don't ask me what those guys drank when choosing the registernames
and encoding) and thus the assembly also seems to be correct.
I don't want to bore you with more details but the binary encoding
of the mnemonic is also correct according to the very same page.
The questions ist: where exactly is the error? And resulting thereof:
Which part of the compiler chain needs to be fixed or should it
be fixed in the sources?
--
Servus,
Daniel