Bug 38001 - regression in 4.3: alignment checks wrongly optimized away (runtime failure)
Summary: regression in 4.3: alignment checks wrongly optimized away (runtime failure)
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 4.3.1
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-03 13:25 UTC by Thomas Orgis
Modified: 2008-11-04 15:20 UTC (History)
1 user (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
tar with the test source file, script to test the bug, example disassembly (2.80 KB, application/octet-stream)
2008-11-03 13:26 UTC, Thomas Orgis
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Orgis 2008-11-03 13:25:05 UTC
For mpg123 I use __attribute__((aligned(16))) for variables that are possibly used by SSE code. Now, I also create a library containing that SSE code, which can be used from third party applications.
The use case of linking gcc-compiled mpg123 code with another compiler that did _not_ align the stack at 16 bytes lead to the implementation of a runtime alignment check in the toplevel API calls of the library.

I stripped this down to the example I am about to attach.
Please note that I observed the issue in my library code first, so it should not be bound to problems with the stack in main() -- though the example has the code in main().

Basic code fragment:

	double __attribute__((aligned(16))) altest[1];
	double __attribute__((aligned(16))) altest2[1];
	if((size_t)altest % 16 != 0 || (size_t)altest2 % 16 != 0)
	printf("not aligned with %!\n");

If the stack from the caller is not proper, I expected this check to fire.
Now a specific user who had -mpreferred-stack-boundary=2 in his CFLAGS pointed me to reinvestigate the matter:

Since that option apparently overrides the __attribute__((aligned(16))) (a bug or misfeature on its own, IMHO), the user encountered a segfault on using the misaligned memory in SSE code.
That made me realize that I not just have to check for misaligned stack from the caller but also for the alignment working at all in one stack frame (hence, the two variables altest and altest2). Now, I updated the check and noticed that gcc-4.3.1 -mpreferred-stack-boundary=2 still produced a crashing program instead of triggering the check at runtime.

I compared with gcc-3.3.6 and gcc-3.4.6; both versions correctly detected the bad alignment at runtime and stopped in a controlled way. This led to investigations with the stripped down code.

Results:
1. gcc-4.3.1 optimizes away the altest % 16 or altest & 15 to plain zero, under the wrong assumption that the alignment to 16 bytes works.
2. consequently, the if(altest % 16 != 0 ...) check is totally optimized away -- even with -O0!
2. additonally, the & 15 check is miscompiled by gcc-3.4.6: it boils down to
  and $0x1,%eax
instead of
  and $0xf,%eax
...please see the attached files, with disassembly. This is inconsistent, even...

Actually, I am not sure how many bugs I am raising here. The main point is:
GCC should not omptimize away a computation which has an unknown result at compile time -- or at least take into account that it actually does so against it's own actions that render the predicted result invalid (align the stack at a different boundary).

Footnote: I know that it is a bad idea to have -mpreferred-stack-boundary=2 in this case, but this shows that the check is miscompiled and that insight also applies to the correctly compiled library being linked with other code that aligns stack differently.
Comment 1 Thomas Orgis 2008-11-03 13:26:04 UTC
Created attachment 16617 [details]
tar with the test source file, script to test the bug, example disassembly
Comment 2 Jakub Jelinek 2008-11-03 13:31:51 UTC
The check is not miscompiled, gcc assumes __attribute__((aligned (16))) variables
are 16 bytes aligned and uses this everywhere, including optimizing out
unnecessary check.  This is a user bug, you simply should never call -mpreferred-stack-boundary=4 routines from -mpreferred-stack-boundary=2 compiled code.  GCC documentation even warns against this.
Comment 3 Thomas Orgis 2008-11-03 13:35:26 UTC
Are you saying that there is no way to protect my library from user programs that have  misaligned stacks?
The whole checking business is in vain, then?

To sad that it working fine with older gccs made be believe that I do something usefule there in preventing segfaults.

And: In any case, if -mpreferred-stack-boundary obviously contradicts the aligned attribute, couldn't gcc at least give a compile error on this?
Comment 4 Thorsten Glaser 2008-11-03 15:55:43 UTC
The aligned attribute is for data, not for the stack.
The gcc texinfo page says that gcc does not re-align
the stack once misaligned (it could do that though,
or you could do that yourself). But optimising away
the checks is something we’re used to from gcc4…
Comment 5 Thomas Orgis 2008-11-03 16:08:20 UTC
Aren't we talking about data on the stack?

And, actually gcc-4.2 can realign the stack, if you ask it to.
But, well ... in any case it would be very helpful if a programmer had a way to deal devensively with stack issues for a library that is compiled separately from programs using it.
Comment 6 Jakub Jelinek 2008-11-03 16:17:57 UTC
4.2 doesn't realign stack except for main, only gcc 4.4 does and only for alignments larger than 16 bytes (and only on i386/x86_64).
You can use an optimization barrier to force a check, but I think it is a bad idea to slow down a library just because users might call it with bad stack alignment, using -mpreferred-stack-boundary is an ABI switch, people who mix
code with different ABIs get what they deserve.
Say
uintptr_t aladdr = (uintptr_t) altest;
asm ("" : "+r" (aladdr));
if ((aladdr & 15) != 0)
  puts ("stack not properly aligned");
Comment 7 H.J. Lu 2008-11-03 23:38:12 UTC
With gcc 4.4 on Linux/ia32, you can use -mincoming-stack-boundary=2
to compile your library to support library users compiled with
-mpreferred-stack-boundary=2. Compile will align stack properly
and efficiently when needed.
Comment 8 Thomas Orgis 2008-11-04 08:52:56 UTC
Ok, first, let me apologize for the "& 15 check is miscompiled" statement... operator precedence got me there.

The feature for stack-realignment I meant is
__attribute__((force_align_arg_pointer))
I use this already for the API entry functions of the library when gcc >= 4.2 is available. It _does_ solve the issue for misaligned stack from the caller, doesn't it?

So, strictly, current gcc does not need the checks, except for the case where the user asked for trouble with -mpreferred-stack-boundary (which still should throw an error: __attribute__((aligned(16))) clearly is in conflict with that).

I still need a way to handle the issue with older gccs, as I provide a source package and want to avoid bug reports about mysterious segfaults: The SSE stuff that needs alignment is an internal detail of the library, users should not need to specifically compile their programs for that detail.

Now, since older gccs don't optimize away the check, it is still of use there.
But I am wondering about the power of __attribute__((aligned(16))); gcc-3.4 seems to be unable to align

double __attribute__((aligned(16))) altest[1];

as opposed to 

double __attribute__((aligned(16))) altest[2];

It sort of makes sense, the data structure should not be smaller than the alignment... but gcc-4.3 does align that correctly. Is that a bug in the older gcc or just coincidence?
Well, I think I will go with the simple check with altest[2] for old compilers that don't know force_align_arg_pointer and luckily don't optimize away the check at the same time -- without need for the optimization barrier.
Is that a good strategy? I can imagine that gcc folks are not that keen on caring for old gcc versions, but with mpg123 we want to support any C89 compiler, basically, however old.

As for -mincoming-stack-boundary=2: __attribute__((force_align_arg_pointer)) does solve the problem equivalently, doesn't it?
Comment 9 H.J. Lu 2008-11-04 15:20:29 UTC
(In reply to comment #8)
> As for -mincoming-stack-boundary=2: __attribute__((force_align_arg_pointer))
> does solve the problem equivalently, doesn't it?
> 

They are the same in gcc 4.4.