Bug 37216 - [cygming] Invalid alignment for SSE store to .comm data generated with -O3
Summary: [cygming] Invalid alignment for SSE store to .comm data generated with -O3
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.1
: P3 normal
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 39039 39194 39369 40333 41950 45036 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-23 22:11 UTC by Simon Sasburg
Modified: 2010-07-23 01:50 UTC (History)
13 users (show)

See Also:
Host: i686-pc-cygwin
Target: i686-pc-cygwin
Build: i686-pc-cygwin
Known to work:
Known to fail:
Last reconfirmed: 2008-08-24 20:12:40


Attachments
Implement alignment for non-local commons (583 bytes, patch)
2008-09-12 15:52 UTC, Nick Clifton
Details | Diff
Revised patch with correct section switching (591 bytes, patch)
2008-09-29 14:13 UTC, Nick Clifton
Details | Diff
lcomm + alignment (362 bytes, patch)
2008-09-29 15:40 UTC, Gianluigi Tiesi
Details | Diff
Nick's aligned common testcase (842 bytes, text/plain)
2008-09-29 18:43 UTC, Danny Smith
Details
Revised patch which handles (size == 0) (600 bytes, patch)
2008-10-03 16:54 UTC, Nick Clifton
Details | Diff
Enable -fno-common with -msse for Cygwin/Mingw (1.10 KB, patch)
2008-10-07 11:37 UTC, Nick Clifton
Details | Diff
Target option and autoconf test to enable aligned common support. (1.27 KB, patch)
2009-05-20 04:25 UTC, Dave Korn
Details | Diff
Revised patch (1.75 KB, patch)
2009-05-23 11:46 UTC, Dave Korn
Details | Diff
D'oh. Quick respin. (1.77 KB, patch)
2009-05-23 14:08 UTC, Dave Korn
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Sasburg 2008-08-23 22:11:02 UTC
GCC generates an XMM instruction with a memory operand not 16-byte aligned as it should be (i think), this generates a segmentation fault when run.

Input file: (test.c)
---------------------------------------------
int iarr[64];
int iint = 0;

int main() {
	int i;
	for(i=0;i<64;i++) {
		iarr[i] = -2;
	}
	return 0;
}
---------------------------------------------

Output of gcc -v:
Using built-in specs.
Target: i686-pc-cygwin
Configured with: ../source-4.3.1/configure --prefix=/usr/src/gcc/prefix-4.3.1 --enable-languages=c,c++ --disable-nls --without-included-gettext --enable-version-specific-runtime-libs --enable-threads=posix --disable-win32-registry
Thread model: posix
gcc version 4.3.1 (GCC) 

Command used to compile:
/usr/src/gcc/prefix-4.3.1/bin/gcc test.c -o test.exe -march=core2 -O3

CPU: Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz

Offending instruction: 'movdqa %xmm0,0x403024'
Comment 1 Uroš Bizjak 2008-08-24 11:26:35 UTC
Please look at the asm dump (add -S to compile flags) for following lines:

.globl iint
	.bss
	.align 4
	.type	iint, @object
	.size	iint, 4
iint:
	.zero	4
	.comm	iarr,256,32              <<<< this one

alternatively, you can use -fno-common:

.globl iarr
	.align 32                   <<<< alignment
	.type	iarr, @object
	.size	iarr, 256
iarr:
	.zero	256

In above dump (linux target) iarr is aligned to 32bytes. Does cygwin aligns .comm variables to multiplies of 16 bytes?
Comment 2 Simon Sasburg 2008-08-24 11:35:36 UTC
with -S added:
.globl _iint
        .bss
        .align 4
_iint:
        .space 4
        .comm   _iarr, 256       # 256
        .section .rdata,"dr"
        .align 16

with -S -fno-common added:
.globl _iint
        .bss
        .align 4
_iint:
        .space 4
.globl _iarr
        .align 32
_iarr:
        .space 256
        .section .rdata,"dr"
        .align 16

Note: with just -fno-common added the resulting test.exe runs without segfaulting
Comment 3 brian 2008-08-24 11:48:58 UTC
Subject: Re:  [cygwin] Invalid alignment for SSE store to .comm 
 data generated with -O3

The the 3 argument version of .comm is only supported by the ELF
assembler so it would be rejected by the PE assembler if gcc were to
emit it.
Comment 4 brian 2008-08-24 12:47:05 UTC
Subject: Re:  [cygwin] Invalid alignment for SSE store to .comm 
 data generated with -O3

Also, this is not Cygwin-specific as far as I can tell, more like
PE-specific since it affects MinGW as well.
Comment 5 Uroš Bizjak 2008-08-24 20:12:40 UTC
Confirmed and added maintainer CC.
Comment 6 brian 2008-08-24 20:59:39 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

It seems to me the issue is that prior to 2007-11-05[1], the PE
assembler could not set section alignment flags correctly so .bss was
simply hardcoded to 2**4 alignment which meant that things happened to
work out.  But now with the current assembler .bss gets the default 2**2
because there's no way to communicate the required alignment of common
symbols in the PECOFF object format that I can tell.  So one workaround
would be that when emitting .comm to also emit an explicit alignment of
.bss so that the assembler sees it and sets the section flags, e.g.

	.comm	_iarr, 256	 # 256
        .bss
        .align 16

I tried the above and it seems to do the right thing with iarr ending up
16-aligned.  But it seems like a dirty way of solving the problem,
though having spurious alignment faults is not exactly pretty either. 
Another route would be to set the .bss minimum back to 2**4 again.

[1] http://sourceware.org/ml/binutils/2007-11/msg00027.html
Comment 7 brian 2008-08-24 21:15:15 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

> Another route would be to set the .bss minimum back to 2**4 again.

Actually that's not really great either because it doesn't do anything
to address the  more general form of the problem, e.g.

int foo[16] __attribute__((aligned (32)));

would still end up at the mercy of whatever the .bss default is, i.e.
that alignment requirement isn't recorded anywhere.
Comment 8 Nick Clifton 2008-09-12 15:52:21 UTC
Created attachment 16303 [details]
Implement alignment for non-local commons
Comment 9 Nick Clifton 2008-09-12 15:54:02 UTC
Hi Brian,

  Please could you try out the uploaded patch which is an implementation of your idea to add an extra alignment directive when emitting commons.  It seems to work for the test case you gave, but I have not yet run it through a full rebuild and retest cycle...

Cheers
  Nick
Comment 10 brian 2008-09-12 23:59:04 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

One thing I was unsure about is this method switches to the .bss section
without switching back to .text (or whatever) afterwards.  As far as I
can tell the common symbols are always emitted at the end of the file
after everything else so this should be ok, but is there any chance of
this function being called from anywhere else?  (It's a shame the PE
assembler doesn't have anything like .pushsection/.popsection.)
Comment 11 Simon Sasburg 2008-09-26 07:32:58 UTC
I tried the attached patch, but gcc failed to build for me on cygwin with it.
Comment 12 SKJ 2008-09-28 08:11:51 UTC
same for me on mingw, bootstrap fails with patch. I get:

/e/mingw_build/build-gcc/./gcc/xgcc -B/e/mingw_build/build-gcc/./gcc/ -L/e/mingw_build/build-gcc/mingw32/winsup/mingw -L/e/mingw_build/build
-gcc/mingw32/winsup/w32api/lib -isystem /e/mingw_build/gcc-4.4-20080926-patched-mingw/winsup/mingw/include -isystem /e/mingw_build/gcc-4.4-2
0080926-patched-mingw/winsup/w32api/include -B/mingw/mingw32/bin/ -B/mingw/mingw32/lib/ -isystem /mingw/mingw32/include -isystem /mingw/ming
w32/sys-include -O2 -D__USE_MINGW_ACCESS -O2 -I../../../mingw_build/gcc-4.4-20080926-patched-mingw/gcc/../winsup/w32api/include -O2 -D__USE_
MINGW_ACCESS -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual -Wold-style-definition  -isystem ./inc
lude   -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED   -I. -I. -I../.././gcc -I../../../../mingw_build/gcc-4.4-20080926-patch
ed-mingw/libgcc -I../../../../mingw_build/gcc-4.4-20080926-patched-mingw/libgcc/. -I../../../../mingw_build/gcc-4.4-20080926-patched-mingw/l
ibgcc/../gcc -I../../../../mingw_build/gcc-4.4-20080926-patched-mingw/libgcc/../include  -DHAVE_CC_TLS -o _gcov.o -MT _gcov.o -MD -MP -MF _g
cov.dep -DL_gcov -c ../../../../mingw_build/gcc-4.4-20080926-patched-mingw/libgcc/../gcc/libgcov.c
C:/ACCOUN~1/Seb/LOCALS~1/Tempcc06q9XS.s: Assembler messages:
C:/ACCOUN~1/Seb/LOCALS~1/Tempcc06q9XS.s:5782: Error: can't resolve `.bss' {.bss section} - `Ltext0' {.text section}
Comment 13 Nick Clifton 2008-09-29 14:13:28 UTC
Created attachment 16425 [details]
Revised patch with correct section switching
Comment 14 Nick Clifton 2008-09-29 14:16:20 UTC
Hi Guys,

  I am not able to reproduce the build problems that were reported with the first version of my patch, but then again I do not have a native cygwin build system or a system in which I can bootstrap mingw32.  I think that Brian may well be right however in that the patch is behaving badly when it manually switches into the .bss section.  I have uploaded a revised version of the patch which uses the correct gcc function to perform the section switch, so please can anyone who is interested give this new patch a run.

Cheers
  Nick
Comment 15 Gianluigi Tiesi 2008-09-29 15:39:29 UTC
I also got the error on the first patch, gcc 4.4 from svn, binutils 2.18.91.20080917
I still have problems with 2.19 binutils snapshots (unable to correctly create and link dll)

unfortunately the current gcc svn does not compile for mingw

because of:
build/i686-pc-mingw32/libstdc++-v3/include/bits/basic_string.h:2689: error: no matching function for call to '__to_xstring(int (*)(wchar_t*, const wchar_t*, char*), const int&, const wchar_t [4], long double&)'

I may need to fill another bug

There is no such code applicable to 4.2 branch :(

Please take a look at the attached patch, I've picked
it from a mailing list, it makes possible align with local variables (static)

I don't known if your patches makes it unneeded

Comment 16 Gianluigi Tiesi 2008-09-29 15:40:21 UTC
Created attachment 16426 [details]
lcomm + alignment
Comment 17 Gianluigi Tiesi 2008-09-29 16:30:52 UTC
with both patches I can achieve align 16
align > 16 on globals still fails

Local Aligned 16: 0
Local Aligned 32: 0
Global Aligned 16: 0
Global Aligned 32: 16

the program is:
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

static int local_16[8] __attribute__ ((aligned (16)));
static int local_32[8] __attribute__ ((aligned (32)));

int global_16[8] __attribute__ ((aligned (16)));
int global_32[8] __attribute__ ((aligned (32)));

int main(void)
{
    printf("Local Aligned 16: %d\n", (intptr_t) local_16 % 16);
    printf("Local Aligned 32: %d\n", (intptr_t) local_32 % 32);
    printf("Global Aligned 16: %d\n", (intptr_t) global_16 % 16);
    printf("Global Aligned 32: %d\n", (intptr_t) global_32 % 32);
    return 0;
}

did I miss something?
(perhaps 16 byte alignment it's enough for sse)
Comment 18 brian 2008-09-29 17:58:56 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

The __to_xstring error is PR37522.  You should still be able to
bootstrap with --enable-languages=c for the purposes of testing this
patch, though.

Your testcase that uses printf with addr % 16 might not be showing you
what you think it is because the compiler is smart enough to optimize
away that computation at compile time since it knows the variable was
declared with the given alignment, even though the actual address
assigned by the linker mightn't be aligned properly.  In my testing I
had to use volatile to prevent that compile-time optimization.

I also forgot to mention that I don't have sse2 capable hardware so I
have no way of testing any of this other than by hand inspection.
Comment 19 Danny Smith 2008-09-29 18:43:04 UTC
Created attachment 16427 [details]
Nick's aligned common testcase
Comment 20 Gianluigi Tiesi 2008-09-29 19:33:22 UTC
align testcase looks ok, but anyway I'm mainly interested to have code aligned to 16. volatile around variables is not enough in my test program.

Nick's testcase is ok even on (local-only align) patched gcc 4.2

I've solved to_xstring problem by using vsnwprintf instead, but current trunk produces buggy ffmpeg executable so many tests are crashing

aligned data looks ok, at least the variables that were unaligned when it was crashing

I'm still not so sure about the optimization and the alignment, this code:
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

static volatile int local_16[8] __attribute__ ((aligned (16)));
static volatile int local_32[8] __attribute__ ((aligned (32)));

int volatile global_16[8] __attribute__ ((aligned (16)));
int volatile global_32[8] __attribute__ ((aligned (32)));

int main(void)
{
    volatile double diff16, diff32;
    printf("Local Aligned 16: %d\n", (intptr_t) local_16 % 16);
    printf("Local Aligned 32: %d\n", (intptr_t) local_32 % 32);
    printf("Global Aligned 16: %d\n", (intptr_t) global_16 % 16);
    printf("Global Aligned 32: %d\n", (intptr_t) global_32 % 32);
    diff16 = (intptr_t) global_16 / 16.0f;
    diff32 = (intptr_t) global_32 / 32.0f;
    printf("16 -> %f - 32 -> %f\n", diff16, diff32);
    return 0;
}

still outputs:
Local Aligned 16: 0
Local Aligned 32: 0
Global Aligned 16: 0
Global Aligned 32: 16
16 -> 263177.000000 - 32 -> 131587.500000

regardless of the optimizations options
Comment 21 brian 2008-09-29 20:06:47 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

This is an example of what I'm talking about: the bar() function is
optimized away to simply "return 0" because the compiler assumes the
alignment is correct without having to actually emit the code to check
it:

$ echo 'char foo[1] __attribute__((aligned(16))); int bar() {
if((int)foo % 16) return 1; else return 0; }' | i686-pc-mingw32-gcc -x c
- -S -o - -fomit-frame-pointer
        .file   ""
        .text
.globl _bar
        .def    _bar;   .scl    2;      .type   32;     .endef
_bar:
        movl    $0, %eax
        ret
        .comm   _foo, 16         # 1
Comment 22 Gianluigi Tiesi 2008-09-29 23:17:51 UTC
still no success while compiling ffmpeg :(

Program received signal SIGSEGV, Segmentation fault.
0x0074fe94 in ff_fft_calc_3dn ()
(gdb) bt
#0  0x0074fe94 in ff_fft_calc_3dn ()
#1  0x007506f5 in ff_fft_calc_3dn ()
#2  0x00750755 in ff_fft_calc_3dn ()
#3  0x00750a55 in ff_fft_dispatch_sse ()
#4  0x00000400 in ?? ()
#5  0x0074f718 in ff_fft_calc_sse (s=0x5b14600, z=0x5b0c5f0) at libavcodec/i386/fft_sse.c:35
#6  0x00748080 in ff_mdct_calc (s=0x5b145f0, out=0x5b0c5f0, input=0x5b105f0) at libavcodec/dsputil.h:673
#7  0x0064731d in encode_superframe (avctx=0x5ac4010, buf=0x5ba0030 "", buf_size=524288, data=0x5b5cca0) at libavcodec/wmaenc.c:92
#8  0x00495b16 in avcodec_encode_audio (avctx=0x5ac4010, buf=0x5ba0030 "", buf_size=524288, samples=0x750a50) at libavcodec/utils.c:865
#9  0x00406236 in output_packet (ist=0x5ac4390, ist_index=0, ost_table=0x3fbfb0, nb_ostreams=1, pkt=0x22fed8) at ffmpeg.c:672
#10 0x004095d1 in av_encode (output_files=0xb6c000, nb_output_files=1, input_files=0xb6b280, nb_input_files=1, stream_maps=0xb6c060, nb_stream_maps=0)
    at ffmpeg.c:2129
#11 0x00409878 in main (argc=Cannot access memory at address 0xa
) at ffmpeg.c:3897
(gdb) p ff_cos_16
$1 = {1, 0.923879504, 0.707106769, 0.382683426, 6.12303177e-017, 0.382683426, 0.707106769, 0.923879504}
(gdb) p &ff_cos_16

$2 = (FFTSample (*)[8]) 0xe55c94


asm code:
.lcomm _ff_cos_16,32,16

lcomm?

nm output:
$ nm ffmpeg_g.exe |grep ff_cos_16
00e55c94 B _ff_cos_16
00e185d4 B _ff_cos_16384

not aligned :(
gcc version 4.3.3 20080929 (prerelease) (Sherpya)
GNU assembler version 2.18.91 (i686-pc-mingw32) using BFD version (GNUBinutils) 2.18.91.20080917


Comment 23 Gianluigi Tiesi 2008-09-29 23:22:42 UTC
a printf in the code for ff_cos_16 causes the compiler to align the var,
but at this point it crashes in another place using sse code
Comment 24 Nick Clifton 2008-09-30 14:05:24 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3

> a printf in the code for ff_cos_16 causes the compiler to align the var,
> but at this point it crashes in another place using sse code

So, does this mean that the align patch is working, but that there is 
some other bug that is preventing some sse code from working ?  If so 
then I can formally submit the align patch for approval.

Cheers
   Nick


Comment 25 Gianluigi Tiesi 2008-09-30 14:10:23 UTC
(In reply to comment #24)
> Subject: Re:  [cygming] Invalid alignment for SSE store
>  to .comm data generated with -O3
> 
> > a printf in the code for ff_cos_16 causes the compiler to align the var,
> > but at this point it crashes in another place using sse code
> 
> So, does this mean that the align patch is working, but that there is 
> some other bug that is preventing some sse code from working ?  If so 
> then I can formally submit the align patch for approval.
> 
> Cheers
>    Nick
> 

unfortunately not, with a printf the var is aligned without a printf is not aligned (someone changes the behavior of the patch), and thus it will crash

the patch seems to make the right job but some other tool in the chain interferes with it
Comment 26 Danny Smith 2008-10-01 01:33:04 UTC
(In reply to comment #14)
> Hi Guys,
> 
>   I am not able to reproduce the build problems that were reported with the
> first version of my patch, but then again I do not have a native cygwin build
> system or a system in which I can bootstrap mingw32.  I think that Brian may
> well be right however in that the patch is behaving badly when it manually
> switches into the .bss section.  I have uploaded a revised version of the patch
> which uses the correct gcc function to perform the section switch, so please
> can anyone who is interested give this new patch a run.


This latter patch, applied to R139853 [*] (just before the big IRA merge) does not cause the build problems of the prior patch but does not solve the problem either.

With this patch, using your _align.c test case which I uploaded in previous comment:
 
gcc -DHAVE_ALIGNED_COMMON _align.c  && a.exe
fails.

The assembler code, specifically the .balign 32 directive applied to the _common object looks like it should do the right thing, but doesn't (:

	.file	"_align.c"
	.bss
	.balign 1
	.comm 	_a, 1
	.balign 32
	.comm 	_common, 64
.globl _b
	.section	.bss,"w"
_b:
	.space 1
.globl _bss
	.align 32
_bss:
	.space 64
.globl _c
	.data
_c:
	.byte	1
.globl _data
	.align 32
_data:
	.long	1
	.space 4
	.long	2
	.space 4
	.long	3
	.space 12
	.long	4
	.space 28
	.text
.globl _test
	.def	_test;	.scl	2;	.type	32;	.endef
_test:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	movl	8(%ebp), %eax
	addl	$8, %eax
	andl	$7, %eax
	testl	%eax, %eax
	jne	L2
	movl	8(%ebp), %eax
	addl	$16, %eax
	andl	$15, %eax
	testl	%eax, %eax
	jne	L2
	movl	8(%ebp), %eax
	addl	$32, %eax
	andl	$31, %eax
	testl	%eax, %eax
	je	L4
L2:
	call	_abort
L4:
	leave
	ret
	.def	___main;	.scl	2;	.type	32;	.endef
.globl _main
	.def	_main;	.scl	2;	.type	32;	.endef
_main:
	pushl	%ebp
	movl	%esp, %ebp
	andl	$-16, %esp
	subl	$16, %esp
	call	___main
	movl	$_common, (%esp)
	call	_test
	movl	$_bss, (%esp)
	call	_test
	movl	$_data, (%esp)
	call	_test
	movl	$0, (%esp)
	call	_exit
	.def	_abort;	.scl	2;	.type	32;	.endef
	.def	_exit;	.scl	2;	.type	32;	.endef




[*] Since the IRA merge, mingw32  has been broken, failing to compile libstdc++ and raising many (>500 in gcc.dg) new testcase errors.  
Comment 27 Danny Smith 2008-10-01 10:22:28 UTC
(In reply to comment #13)
> Created an attachment (id=16425) [edit]
> Revised patch with correct section switching
> 
That patch causes other problems
This test case:

/* t1.c */
int  i[0];
int k;

void testi (void)
{
  i[0] = 0;
}

void testk (void)
{
  k = 0;
}


int main (void)
{
  return 0;
} 

now fails with 
t1.c:(.text+0x5): undefined reference to `_i' 

The assembler code is
	.file	"t1.c"
	.text
	.p2align 4,,15
.globl _testi
	.def	_testi;	.scl	2;	.type	32;	.endef
_testi:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$0, _i
	popl	%ebp
	ret
	.p2align 4,,15
.globl _testk
	.def	_testk;	.scl	2;	.type	32;	.endef
_testk:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$0, _k
	popl	%ebp
	ret
	.def	___main;	.scl	2;	.type	32;	.endef
	.p2align 4,,15
.globl _main
	.def	_main;	.scl	2;	.type	32;	.endef
_main:
	pushl	%ebp
	movl	%esp, %ebp
	andl	$-16, %esp
	call	___main
	xorl	%eax, %eax
	movl	%ebp, %esp
	popl	%ebp
	ret
	.bss
	.balign 4
	.comm 	_i, 0 <<<<< size of i[0]
	.balign 4
	.comm 	_k, 4


I suspect we need to add this bit, or similar, back in
-  rounded = size ? size : 1;
-  rounded += (BIGGEST_ALIGNMENT / BITS_PER_UNIT) - 1;
-  rounded = (rounded / (BIGGEST_ALIGNMENT / BITS_PER_UNIT)
-	     * (BIGGEST_ALIGNMENT / BITS_PER_UNIT));

and output rounded rather than size.

Testing now.
Comment 28 Nick Clifton 2008-10-03 16:52:56 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3

Hi Danny,

> This test case:
> t1.c:(.text+0x5): undefined reference to `_i' 

Hmm, I cannot reproduce this, however...

>         .comm   _i, 0 <<<<< size of i[0]

I did drop the code to handle (size == 0) because I was not sure why it 
was there.

> I suspect we need to add this bit, or similar, back in
> -  rounded = size ? size : 1;

How about this third revision of the patch ?

Cheers
   Nick

Comment 29 Nick Clifton 2008-10-03 16:54:23 UTC
Created attachment 16458 [details]
Revised patch which handles (size == 0)
Comment 30 Gianluigi Tiesi 2008-10-03 17:06:26 UTC
the patch looks ok but unfortunately does not always solves the problem,
something in the chain misalignes the symbol
This does not happen always but in some circumstances :(
Comment 31 Nick Clifton 2008-10-04 08:27:23 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3


> the patch looks ok but unfortunately does not always solves the problem,
> something in the chain misalignes the symbol
> This does not happen always but in some circumstances :(

Possibly a linker bug.  Could you investigate some more ?  For example 
what alignment does the .bss section have in the object file and in the 
executable ?

Cheers
   Nick


Comment 32 Gianluigi Tiesi 2008-10-04 21:40:02 UTC
this archive:
http://people.netfarm.it/~sherpya/gcc/info.7z

contains

ffmpeg_g.exe - non stripped final executable
fft.c/o/s - source, object file and asm generated

related vars are:
ff_cos_16, ff_cos_32 etc

$ nm ffmpeg_g.exe |grep ff_cos_16
00e62d14 B _ff_cos_16
00e25654 B _ff_cos_16384

$ nm libavcodec/fft.o |grep ff_cos_16
00000020 C _ff_cos_16
00008000 C _ff_cos_16384


gcc version 4.3.3 20081004 (prerelease) (Sherpya) (plain svn with the patch)
GNU ld (GNU Binutils) 2.18.91.20080917

I can also provide all my build env as 7z

Regards

Comment 33 Nick Clifton 2008-10-06 12:43:17 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3

Hi,

> http://people.netfarm.it/~sherpya/gcc/info.7z

Just a quick check: If you build with "-fno-common" does the executable 
then work ?

Cheers
   Nick

Comment 34 Gianluigi Tiesi 2008-10-06 14:13:21 UTC
$ nm ffmpeg_g.exe |grep ff_cos_16
00dd84e0 B _ff_cos_16
00de04c0 B _ff_cos_16384

except snow and svq1 tests, crashing because of bugs in tree opts on win32
sse code is working fine

Comment 35 Nick Clifton 2008-10-06 16:43:49 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3

Hi sherpya,

> ------- Comment #34 from sherpya at netfarm dot it  2008-10-06 14:13 -------
> $ nm ffmpeg_g.exe |grep ff_cos_16
> 00dd84e0 B _ff_cos_16
> 00de04c0 B _ff_cos_16384
> 
> except snow and svq1 tests, crashing because of bugs in tree opts on win32
> sse code is working fine

As I suspected.  The PE/COFF file format does not provide for specifying 
the alignment of commons.

Hmm, I wonder if gcc should complain if it finds aligned commons with 
COFF backends or if we should try to generate a GNU extension to the 
COFF format.

Cheers
   Nick


Comment 36 Gianluigi Tiesi 2008-10-06 17:14:19 UTC
so how with -fno-common can make aligned work?
Comment 37 Nick Clifton 2008-10-06 17:24:00 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3

Hi,

> so how with -fno-common can make aligned work?

Hang on - I thought that you had said that when ffmpeg_g.exe was built 
with -fno-common that it worked, modulo some (completely separate) tree 
optimization bugs.  Is that not right ?



Comment 38 Gianluigi Tiesi 2008-10-06 17:27:11 UTC
yes alignment works, and even my test align program with 4.2 without patches gives correct alignment to local and global symbols

Local Aligned 16: 0
Local Aligned 32: 0
Global Aligned 16: 0
Global Aligned 32: 0
16 -> 265986.000000 - 32 -> 132994.000000
Comment 39 Nick Clifton 2008-10-06 18:16:16 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3


> yes alignment works, and even my test align program with 4.2 without patches
> gives correct alignment to local and global symbols

OK, so when you said:

  "so how with -fno-common can make aligned work?"

did you really mean:

  "so how without -fno-common can make aligned work?"

In which case the answer is - I do not know.  The problem is that the 
PE/COFF file format does not support aligned commons.  We could try to 
create an extension to the format to support them, but that would be 
non-standard.  Another possibility would be to turn any common symbol 
with an alignment attribute into a weak symbol instead.  This would work 
(I think, I have not tried it), provided that there are no other 
definitions of the same symbol with a different size.  Which would 
possibly cause problems with FORTRAN programs but it is unlikely to be 
an issue with C/C++ programs.

Another possibility is to modify the linker so that when it is placing 
common symbols into the .bss section of the output executable every 
symbol is aligned to the maximum alignment specified for any of the .bss 
sections of any input object file.  This would probably waste huge 
amounts of space in the .bss section (for large programs anyway) but it 
ought to work.

Cheers
   Nick

Comment 40 Gianluigi Tiesi 2008-10-06 18:54:23 UTC
I mean that with -fno-common alignment works, even with non patched 4.2, my question is due to the fact that it's not so clear for me what no-common does

and adding -fno-common what are side effects?

do using something like dummy or nops in bss section does something wrong?

at this point a forced alignment to 16 wouldn't be such a problem for the space wasted, at least this can avoid problems with sse code (16 bytes) and 3dnow (8 bytes)

gcc may need to disable sse code on mingw/cygwin or at least avoid implicit vectorizations
Comment 41 Danny Smith 2008-10-06 20:18:00 UTC
(In reply to comment #35)

> As I suspected.  The PE/COFF file format does not provide for specifying 
> the alignment of commons.
> 
> Hmm, I wonder if gcc should complain if it finds aligned commons with 
> COFF backends or if we should try to generate a GNU extension to the 
> COFF format.
> 

Aligned commons are not part of the PE COFF spec and AFAICT neither MASM nor NASM provide a way to specify aligned commons.  I am afraid that a GNU extension will cause object incompatibility, so it would it need to be a non-default option.  We already have
 -fno-common
__attribute__((no_common)) 
#pragma gcc optimize (no_common)  

G++ already puts commons in .bss

Perhaps a new option -fno_large_common that applies -fcommon only to objects with align <= 4 bytes?

A warning would be useful in any case.

Danny
Comment 42 brian 2008-10-06 23:29:24 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

When you are comparing gcc 4.2 to current trunk, are you keeping the
linker (binutils) version the same?  As mentioned in comment 6, the
linker behavior changed recently which results in a different default
alignment of the .bss segment which could explain the difference.

IMHO, we should just have gcc automatically enable -fno-common on PE if
SSE is enabled.  That's what the MS tools do, AFAICT.
Comment 43 Gianluigi Tiesi 2008-10-07 01:32:11 UTC
binutils 2.18.91.20080917 on both
there are changes for the local alignment in the gas code but gcc does not use them without my attached gcc_align_fix.diff patch (not sure 100%)

newer binutils are not working well on mingw
Comment 44 Nick Clifton 2008-10-07 10:57:43 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store
 to .comm data generated with -O3

sherpya at netfarm dot it wrote:
> I mean that with -fno-common alignment works, even with non patched 4.2, my
> question is due to the fact that it's not so clear for me what no-common does

-fno-common stops uninitialized variables declared in C and C++ programs 
from being treated in the same way as common variables declared in 
FORTRAN programs.

> and adding -fno-common what are side effects?

Essentially there are two side effects:  The first is that you will get 
a link time error if you declare the same uninitialized variable twice 
in two different source files, without using the 'extern' keyword on one 
of them.  eg:

   % cat foo.c
   int a;

   % cat bar.c
   int a;
   int main (void) { return 0; }

   % gcc foo.c bar.c

   % gcc -fno-common foo.c bar.c
   multiple definition of `a'

This is often a problem with badly written header files which declare 
variables without using 'extern'.  eg:

   % cat header.h
   int a;

   % cat foo.c
   #include "header.h"
   int a;

   % cat bar.c
   #include "header.h"
   int main (void) { return 0; }

   % gcc -fno-common foo.c bar.c
   multiple definition of `a'

The other side-effect, and the one that is more interesting for our 
purposes, is that it forces uninitialised variables to be placed into 
the .bss section.  This is important because symbols in the PE/COFF file 
format do not have an alignment attribute of their own.  Instead the 
alignment is inherited by the containing section, with the maximum 
alignment of any symbol inside a section being taken as the section's 
alignment as a whole.  Symbols are placed inside the section on suitably 
aligned boundaries, so that providing that the section itself is placed 
on an alignment boundary everything will work.  eg:

   % cat foo.c
   int normal_align;
   int aligned_16 __attribute__((aligned(16)));

   % gcc -fno-common -c foo.c
   % objdump --syms foo.o
   [  8](sec 3)(fl 0x00)(ty 0)(scl 2)(nx 0) 0x00000000 _normal_align
   [  9](sec 3)(fl 0x00)(ty 0)(scl 2)(nx 0) 0x00000010 _aligned_16

   Note how the 'aligned_16' variable starts at an offset of
   00000010 from the start of section 3, whereas 'normal_align'
   starts at an offset of 00000000.  Ie there is a gap of 12
   bytes from offset 00000004 to 0000000f.

   % objdump -h foo.o
   Idx Name          Size      VMA       LMA     File off  Algn
   2 .bss          00000020  00000000  00000000  00000000  2**4

   Note that the .bss section has been given an alignment of 2^4.
   This is because it contains 'aligned_16'.  If that variable had
   not been declared then the .bss section would have been given
   its default alignment of 2^2.

   Also note that section numbering differs between the two uses
   of objdump.  Ie "(sec 3)" in the "objdump --syms" output refers
   to the third declared section which is the section with an
   index of 2 in the "objdump -h" output.

   % cat bar.c
   int a;

   % gcc -fno-common bar.c foo.o
   % nm a.exe
   00402000 B _a
   00402020 B _aligned_16
   00402010 B _normal_align

   So after linking 'aligned_16' still has a 16-bit alignment
   because of the 2^4 alignment of the .bss section in the foo.o
   object file.

The reason that all of this is important is that when common variables 
are stored in a PE/COFF object file they are not assigned to any 
section.  Since only sections, not symbols, have an alignment attribute 
in PE/COFF object files, any alignment requirements of common symbols 
are lost.  This is what has been causing the problems that you have 
experienced.  eg:

   % cat foo.c
   int normal_align;
   int aligned_16 __attribute__((aligned(16)));

   % gcc -c foo.c
   % objdump --syms foo.o
   [ 8](sec 0)(fl 0x00)(ty 0)(scl 2)(nx 0) 0x00000004 _normal_align
   [ 9](sec 0)(fl 0x00)(ty 0)(scl 2)(nx 0) 0x00000004 _aligned_16

   Note how the variables are assigned to "(sec 0)" which does not
   exist and that there is no field or flag specifying the alignment
   for either of them.

You may ask why common variables are not assigned to the .bss section. 
The reason is that if there are multiple declarations of the same 
variable and all but one of which are common, then the non-common 
declaration takes precedence.  eg:

   % cat foo.c
   int a;

   % cat bar.c
   int a = 1;

   % gcc foo.c bar.c
   % nm a.exe
   00402000 D _a

Ie the 'a' variable has been placed in the .data section and not the 
.bss section, despite the fact that it was declared uninitialised in 
foo.c.

So common variables are not assigned to a section until the final link 
takes places.  If there are no non-common definitions of a variable to 
specify where they should be placed then they are assigned to the .bss 
section, but by then it is too late - the alignment requirements of the 
symbol have been lost.

Cheers
   Nick

Comment 45 Nick Clifton 2008-10-07 11:37:11 UTC
Created attachment 16475 [details]
Enable -fno-common with -msse for Cygwin/Mingw
Comment 46 Nick Clifton 2008-10-07 11:38:15 UTC
Hi Brian,

> IMHO, we should just have gcc automatically enable -fno-common on PE if
> SSE is enabled.  That's what the MS tools do, AFAICT.

Something like the newly uploaded patch ?

Cheers
  Nick
Comment 47 Gianluigi Tiesi 2008-10-07 11:50:02 UTC
ffmpeg uses aligned vars inside an object from an external nasm/yasm compiled module, so it's very unlikely gcc can be aware of this, the patch should solve "implicit" sse code generation, but I think there are no solution for external references.
Probably the best way is to pass -fno-common when there is sse code in the project

I'll test your patch for the first post of the bugreport, and I'll test also ffmpeg but I'm almost sure that this TARGET_SSE is not true in this specific case

with -fno-common I had no problems with make test (when using sse code)

all of the stuff in this bugreport is not really definitive,
we have a patch for local and for global using commons
but there is no way to propagate this data in pe/coff object

why not adding an extensions for intermediate coffs?
there isn't space anywhere in coff files?

yes one can use -fno-common or the fix to avoid crashes when gcc vectorize
but it does not cover all cases

Comment 48 brian 2008-10-07 12:01:37 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

sherpya at netfarm dot it wrote:

> I'll test your patch for the first post of the bugreport, and I'll test also
> ffmpeg but I'm almost sure that this TARGET_SSE is not true in this specific
> case

You said you used -march=core2 to compile.  This implicitly includes
-msse2 (and -msse) so yes it will be set.

> yes one can use -fno-common or the fix to avoid crashes when gcc vectorize
> but it does not cover all cases

What cases would it not cover?  When placing items directly in bss or
data the required alignment can be unambiguously conveyed to the
assembler so I don't see how it would get lost.
Comment 49 Gianluigi Tiesi 2008-10-07 12:15:03 UTC
not exactly, Simon Sasburg compiled with -march=core2 I'm not explicitly telling to gcc to compile sse code, arch is i686 and opt is -O2 so there is no sse code generated by gcc, ffmpeg declares aligned vars in fft.c and uses them in asm sse code, file fft_mmx.asm that is generated by yasm

so gcc is unable to known that those vars will be used by sse code, this
case will not covered by the last patch

gcc should emit a warning when compiling align declared vars on win32
version 3.x had a similar warning but only if using align > 16
with current state of compiler/toolchain even 16-bytes aligned is not possible
without explicit -fno-common compile switch
Comment 50 brian 2008-10-07 12:46:26 UTC
Subject: Re:  [cygming] Invalid alignment for SSE store to 
 .comm data generated with -O3

Oh, I see what you mean now.  Yeah, predicating it on just TARGET_SSE
isn't sufficient.

I'm starting to think the idea of a PE/COFF-specific convention to
record the alignment isn't such a crazy idea.  It could just be a
section with a specific name e.g. .gnu.note.aligncomm.  The PE assembler
could then accept the 3-argument form of .comm and store the info there,
and the linker would be able to consume it.  When linking with a non-GNU
linker the section would just be silently ignored and the alignment info
discarded, which isn't any worse than the current situation where it's
also discarded, so you'd just be back to requiring -fno-common.  Of
course that's a much more ambitious change.
Comment 51 Uroš Bizjak 2009-01-30 09:56:00 UTC
*** Bug 39039 has been marked as a duplicate of this bug. ***
Comment 52 Uroš Bizjak 2009-02-16 07:41:15 UTC
*** Bug 39194 has been marked as a duplicate of this bug. ***
Comment 53 Danny Smith 2009-04-16 02:59:45 UTC
This thread is alos relevant.
http://gcc.gnu.org/ml/gcc/2009-04/msg00462.html
Adding Dave Korn to cc:
Comment 54 Dave Korn 2009-05-14 06:25:10 UTC
I've started work on the binutils support for this.  Work-in-progress patch at http://sourceware.org/ml/binutils/2009-05/msg00228.html

Once that's complete, I could deal with the GCC end too.

What should we do about backward-compatibility?  If we attempt to use the new features with the old toolchain, it won't work, and the linker will issue a bunch of noisy warnings about the .drectve statements it doesn't understand.

Should use of the new feature depend on a -m flag, or an assembler/linker version check of some sort?  Or should we just go ahead and let users of old toolchains get a bunch of warnings?  On the same lines, should we still continue to pad all COMMON symbols to a round multiple of BIGGEST_ALIGNMENT, or should we get rid of that when we're using the new feature?
Comment 55 Uroš Bizjak 2009-05-14 07:51:32 UTC
(In reply to comment #54)
> I've started work on the binutils support for this.  Work-in-progress patch at
> http://sourceware.org/ml/binutils/2009-05/msg00228.html
> 
> Once that's complete, I could deal with the GCC end too.
> 
> What should we do about backward-compatibility?  If we attempt to use the new
> features with the old toolchain, it won't work, and the linker will issue a
> bunch of noisy warnings about the .drectve statements it doesn't understand.
> 
> Should use of the new feature depend on a -m flag, or an assembler/linker
> version check of some sort?  Or should we just go ahead and let users of old
> toolchains get a bunch of warnings?  On the same lines, should we still
> continue to pad all COMMON symbols to a round multiple of BIGGEST_ALIGNMENT, or
> should we get rid of that when we're using the new feature?

Perhaps you could activate -mno-common if the new feature you are implementing in your binutils patch is not detected by configure?

Comment 56 Dave Korn 2009-05-20 04:25:51 UTC
Created attachment 17895 [details]
Target option and autoconf test to enable aligned common support.

Currently putting the attached through a bootstrap and test cycle.

This adds a configure time GAS feature check for the 3-operand form aligned .comm pseudo-op, and a target-specific command-line switch -mpe-aligned-commons which is enabled or disabled by default depending on the status returned from the configure check.

It doesn't yet address disabling use of commons when no alignment support is available.  That can be done as a follow-on patch.
Comment 57 Dave Korn 2009-05-20 21:16:50 UTC
Bah.  In case anyone else was about to point this out to me,

+gcc_GAS_CHECK_FEATURE([.comm with alignment], gcc_cv_as_comm_has_align,
+ [2,19,52],,
+ [.comm foo,1,32],,
+[AC_DEFINE(HAVE_GAS_ALIGNED_COMM, 1,
+  [Define if your assembler supports specifying the alignment
+   of objects allocated using the GAS .comm command.])])

... that won't work with ...

+mpe-aligned-commons
+Target Var(use_pe_aligned_common) Init(HAVE_GAS_ALIGNED_COMM)

... that, because it's undefined, not zero, when the feature isn't detected.  I'll have to respin it.

Comment 58 Dave Korn 2009-05-23 11:46:02 UTC
Created attachment 17906 [details]
Revised patch

Revised version of the patch that defines the autoconf feature test macro to 0/1 rather than to undefined/defined, so it can be used to init the target option variable correctly even when gas during configure does not support 3-option .comm format.

Nearly completed bootstrap now, will post to the -patches list (minus the generated file diffs) later when testing underway.
Comment 59 Dave Korn 2009-05-23 14:08:34 UTC
Created attachment 17909 [details]
D'oh.  Quick respin.

Huh.  Alignment is passed to the backend as number of bits, but of course the assembler wants it as the power of 2 of the number of bytes to align it to.  Fortunately I spotted this in manual testing before kicking off the 48-hr testrun, so I'm restrapping the patch with one tiny change to winnt.c before I do.
Comment 60 Dave Korn 2009-05-28 10:48:52 UTC
Subject: Bug 37216

Author: davek
Date: Thu May 28 10:48:35 2009
New Revision: 147950

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147950
Log:
gcc/ChangeLog:

2009-05-28  Dave Korn  <dave.korn.cygwin@gmail.com>

	PR target/37216

	* configure.ac (HAVE_GAS_ALIGNED_COMM):  Add autoconf test and
	macro definition for support of three-operand format aligned
	.comm directive in assembler on cygwin/pe/mingw target OS.
	* configure:  Regenerate.
	* config.in:  Regenerate.

	* config/i386/winnt.c (i386_pe_asm_output_aligned_decl_common):  Use
	aligned form of .comm directive if -mpe-aligned-commons is in effect.
	* config/i386/cygming.opt (-mpe-aligned-commons):  Add new option.

	* doc/invoke.texi (-mpe-aligned-commons):  Document new target option.
	* doc/tm.texi (ASM_OUTPUT_COMMON):  Document zero size commons.

gcc/testsuite/ChangeLog:

2009-05-28  Dave Korn  <dave.korn.cygwin@gmail.com>
            Uros Bizjak  <ubizjak@gmail.com>
            Danny Smith  <dansmister@gmail.com>

	PR target/37216

	* lib/target-supports.exp (check_effective_target_pe_aligned_commons):
	New function.
	* gcc.target/i386/pr37216.c:  New test source file.
	* gcc.dg/compat/struct-layout-1_generate.c (dg_options[]):  No longer
	use -fno-common for testing Cygwin and MinGW targets.



Added:
    trunk/gcc/testsuite/gcc.target/i386/pr37216.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config.in
    trunk/gcc/config/i386/cygming.opt
    trunk/gcc/config/i386/winnt.c
    trunk/gcc/configure
    trunk/gcc/configure.ac
    trunk/gcc/doc/invoke.texi
    trunk/gcc/doc/tm.texi
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_generate.c
    trunk/gcc/testsuite/lib/target-supports.exp

Comment 61 Uroš Bizjak 2009-05-28 21:45:01 UTC
Fixed for 4.5.0.
Comment 62 Andrew Pinski 2009-06-03 21:26:41 UTC
*** Bug 40333 has been marked as a duplicate of this bug. ***
Comment 63 Simon Sasburg 2009-07-04 12:41:14 UTC
GCC still generates a segfaulting executable when used with the testcase in the report, most likely because my assembler doesn't support the 3-argument .comm directive.

When using the '-mpe-aligned-commons' i get an error about this:
> Error: junk at end of line, first unrecognized character is `,'
Comment 64 Dave Korn 2009-07-04 12:47:46 UTC
(In reply to comment #63)
> GCC still generates a segfaulting executable when used with the testcase in the
> report, most likely because my assembler doesn't support the 3-argument .comm
> directive.
> 
> When using the '-mpe-aligned-commons' i get an error about this:
> > Error: junk at end of line, first unrecognized character is `,'
> 

Of course.  What were you hoping would happen?
Comment 65 Simon Sasburg 2009-07-04 13:17:43 UTC
Indeed, i should have expected this, and after rereading the comments here you even mentioned this problem already. Sorry for the noise.
Comment 66 Dave Korn 2009-07-04 13:21:26 UTC
(In reply to comment #65)
> Indeed, i should have expected this, and after rereading the comments here you
> even mentioned this problem already. Sorry for the noise.
> 

That's OK.  GCC attempts to detect at configure time if the installed toolchain supports the feature, and will configure itself to generate or not the 3-operand form according to the results of that test.  The command-line option is there if you ever need to override it, but that'll only be the case if you're mixing-and-matching toolsets or doing some other kind of complex trickery that involves using GCC with a different binutils from that it was configured for.
Comment 67 Ozkan Sezer 2009-09-03 12:47:01 UTC
Will the fix for this bug be backported to the 4.4 branch?
Comment 68 Dave Korn 2009-09-03 12:53:29 UTC
(In reply to comment #67)
> Will the fix for this bug be backported to the 4.4 branch?
> 

I wasn't planning to do so myself, in the near future at any rate.  All my personal time is going into getting everything fixed for 4.5.0.  Maybe after that's working well I will have time to backport it.  Of course, anyone else who has the time available is free to do that themselves and submit it to the -patches list.
Comment 69 Uroš Bizjak 2009-09-17 08:59:36 UTC
*** Bug 39369 has been marked as a duplicate of this bug. ***
Comment 70 Andrew Pinski 2009-11-05 17:42:54 UTC
*** Bug 41950 has been marked as a duplicate of this bug. ***
Comment 71 Andrew Pinski 2010-07-23 01:50:53 UTC
*** Bug 45036 has been marked as a duplicate of this bug. ***