This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

__attribute__((aligned(16)) on x86



Hello.

I've tried the following query on gcc-help and submitted a bug, but no
responses so far.  If you could CC: me directly on any responses I'd
appreciate it (I do not subscribe to this list).

The following code crashes:

---cut here---

class fvector
{
public:
    float x;
    float y;
    float z;
    float w;

    fvector(float _x, float _y, float _z, float _w)
            : x(_x), y(_y), z(_z), w(_w)
        {}
    fvector()
        {}

    fvector operator+(const fvector &v1)
        {
            fvector ret __attribute__((aligned(16)));

            asm volatile("movl   %1,      %%esi  ;" 
                         "movl   %2,      %%edi  ;" 
                         "movaps (%%esi), %%xmm0 ;"
                         "addps  (%%edi), %%xmm0 ;"
                         "movaps %%xmm0,  %0     ;" 
                         : "=m" (ret) : "m" (this), "m" (&v1));

            return ret;
        }

} __attribute__((aligned(16)));

int main(int argc, char *argv[])
{
    fvector v1(3.0f, 4.0f, 5.0f, 1.0f);
    fvector v2(7.0f, 1.0f, 9.0f, 1.0f);
       
    fvector ans = v1 + v2;

    return 0;
}
---cut here---

When compiled with:

g++ -o fvector fvector.cpp -g

The problem is that none of v1, v2 or ret are aligned properly, so the
first movaps instruction fails.

I'm willing to dig into gcc source to try and come up with a patch, but
don't even have a clue where to start looking.  I tried setting
breakpoints on such functions as ix86_local_alignment, emit_local_var,
do_pending_stack_adjust, and assign_stack_temp_for_type, but I can't
figure out what's going on.  As far as I can tell, the alignment is
correct in the tree (I see a "user align = 16" whenever I dump out the
type in ix86_local_alignment), but it seems to be completely ignored
when it comes time to generate instructions.

A work around is to declare v1, v2 and ret as static, but that's an ugly
thing to impose on users of a "fast math" library (which I'm trying to
write here).

So any help in pointing me to the right place in gcc source would be
appreciated, or if someone out there already familiar with the source
could come up with a patch, even better :)

Here is my g++ info:

Reading specs from
/usr/local/gcc-3.0/lib/gcc-lib/i686-pc-linux-gnu/3.0/specs
Configured with: ../configure --prefix=/usr/local/gcc-3.0
--enable-shared --enable-threads=posix --disable-checking
--enable-long-long --enable-cstdio=stdio --enable-clocale=generic
--enable-languages=c,c++
Thread model: posix
gcc version 3.0

Note that this also doesn't work with this version:

Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.95.3/specs
gcc version 2.95.3 19991030 (prerelease)


Thanks


-- 
Ryan T. Sammartino
http://members.home.net/ryants/
Give me a fish and I will eat today.

Teach me to fish and I will eat forever.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]