This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
It indicated that sibling calling optimization in main should
be disabled for targets that need to up the stack alignment,
otherwise you get the stack alignment of a lower one than
While that may be true, I think the problem is broader.
I took out the main1() function and put it into a separate
file, and compiled just that. So now there is no carnal
knowledge of main or its stack alignment. The generated
code for this stand-alone main1() makes no attempt to
align the stack or the stack variables it is going to be
passing to the movdqa instruction. Unless thats what you
mean by:
that is required. You have to look to see what changed
between 3.4.0 and 4.0.0 that caused this since it is a
regression. I think the issue is that we are detecting them
at the tree level but not rejecting them when expanding. So you
have to look at the expand functions for that.
You're using internals verbiage thats beyond me :) I'm a
simple porter, I have very little understanding of the actual
internals of GCC.
The reason why nobody notices this before is because most x86 OS's
now a days align their stack going into main as 16byte aligned
which was what my comment about fixing your OS was about, it was
more of a joke rather than anything else.
Ok I appologise Andrew. I took it as a SCO-bash. My bad.
However, I dont think the stack being aligned on a 16-byte
boundary into main will help, unless GCC is assuming (and I
dont see how it possibly could) that every function would
likewise be aligned. The fact that a stand-alone version of
main1() was not correctly aligned leads me to believe that
the real error is that gcc is not making an attempt to
align the stack variables for use by the alignment-sensitive
vector insns.
Also, when you say "stack going into main is 16 byte aligned",
what specifically do you mean? that its 16-byte aligned before
the call to main() itself? That at the first insn in main, most
likely a push %ebp, its 16-byte aligned (i.e does the call
to main from crt1.o have to take the push of the return address
into account)?
Kean
PS, here is the generated assembly for main() as a stand-alone
function, nothing else defined in the .c file:
.file "foo.c"
.version "01.01"
.section .rodata
.align 32
.type C.0.1458, @object
.size C.0.1458, 32
C.0.1458:
.long 0
.long 3
.long 6
.long 9
.long 12
.long 15
.long 18
.long 21
.text
.align 16
.globl main1
.type main1, @function
main1:
pushl %ebp
movl $8, %ecx
movl %esp, %ebp
pushl %edi
cld
pushl %esi
leal -40(%ebp), %edi
subl $64, %esp
movl $C.0.1458, %esi
rep
movsl
xorl %edx, %edx
leal -40(%ebp), %esi
leal -72(%ebp), %ecx
.align 16
.L2:
leal 0(,%edx,4), %eax
addl $4, %edx
cmpl $8, %edx
movdqa (%esi,%eax), %xmm0
movdqa %xmm0, (%ecx,%eax)
jne .L2
movb $1, %dl
.align 16
.L4:
movl -4(%ecx,%edx,4), %eax
cmpl -4(%esi,%edx,4), %eax
jne .L14
incl %edx
cmpl $9, %edx
jne .L4
addl $64, %esp
xorl %eax, %eax
popl %esi
popl %edi
popl %ebp
ret
.L14:
call abort
.size main1, .-main1
.ident "GCC: (GNU) 4.0.3 20051013 (prerelease)"
# cat foo.c
#define N 8
int main1 ()
{
int b[N] = {0,3,6,9,12,15,18,21};
int a[N];
int i;
for (i = 0; i < N; i++)
{
a[i] = b[i];
}
/* check results: */
for (i = 0; i < N; i++)
{
if (a[i] != b[i])
abort ();
}
return 0;
}