Consider the following sample program: static inline void write32(void *addr, unsigned int value) { *(volatile unsigned int *)addr = value; } extern unsigned char testvalue[]; void main() { write32(&testvalue, 4); } GCC 5.2 (with -O2 -fno-stack-protector -ffreestanding -fomit-frame-pointer) generates: main: mov r1, #4 mov r2, #0 ldr r3, .L2 ldrb r0, [r3] @ zero_extendqisi2 strb r1, [r3] ldrb r1, [r3, #1] @ zero_extendqisi2 strb r2, [r3, #1] ldrb r1, [r3, #2] @ zero_extendqisi2 strb r2, [r3, #2] ldrb r1, [r3, #3] @ zero_extendqisi2 strb r2, [r3, #3] bx lr Whereas GCC 4.9 (same arguments) generates: main: movw r3, #:lower16:testvalue movs r2, #4 movt r3, #:upper16:testvalue ldr r1, [r3] str r2, [r3] bx lr I think there are two problems with this: 1. Both GCC 5.2 and 4.9 generate superfluous load instructions to |addr|, even though the pointer is never dereferenced as an rvalue (only as the left-hand side of an assignment). This is a) unnecessary and b) dangerous since the pointer is declared volatile (meaning it could be an MMIO register with read side-effects). 2. GCC 5.2 seems to somehow assume it must treat the object as unaligned and generates byte-wise accesses. I don't understand why it would do that, since nothing here is declared __attribute__((packed)) and the explicit cast to a 4-byte-aligned type should remove any ambiguity for the compiler about which alignment it can assume. It is important for systems code to have a (portable) way to dereference an address with an explicit alignment regardless of where it came from (e.g. for MMIO registers that only support 32-bit wide accesses). Both of these issues can be solved by using __builtin_assume_aligned(addr, sizeof(unsigned long)), but I still think the compiler shouldn't have done this in the first place (and having a portable solution to this problem is preferable). Context: http://review.coreboot.org/#/c/11698
edit: target architecture in my example is armv7a, btw (although I doubt that this is architecture-specific)
(In reply to Julius Werner from comment #1) > edit: target architecture in my example is armv7a, btw (although I doubt > that this is architecture-specific) I suspect this is an armv7 issue only where there is missing support for misaligned load/stores. Also testvalue is aligned to 1 byte by default so what GCC is doing is correct as there is no way to do a volatile unaligned loads really.
> I suspect this is an armv7 issue only where there is missing support for misaligned load/stores. Did a quick test on aarch64 and you're right, neither of the two issues appears there. > Also testvalue is aligned to 1 byte by default so what GCC is doing is correct as there is no way to do a volatile unaligned loads really. I thought alignment was always tied to a type (at least by default, unless overridden by __attribute__((align)) or the like)? It's true that the expected alignment for testvalue is 1, but I'm not dereferencing testvalue... I'm dereferencing *(volatile unsigned long *)&testvalue. Shouldn't the alignment of that always be 4? (FWIW, _Alignof(*(volatile unsigned long *)&testvalue) evaluates to 4 in both compiler versions.) Another argument: if write32() was defined as a non-inline function in a separate execution unit, this problem would not appear. Should inlining a function really change behavior in this way and produce code that is clearly less efficient (to try to work around a "problem" that the -O0 version of the same code would have never found)?
As said in another bug GCC tries to be conservative (QOI) and fixes up programmer mistakes in case it can see that the alignment of some memory is clearly _not_ according to the language requirement (as here, dereferencing a int *). In this case it sees the global 'unsigned char[]' and concludes its alignment does not have to be larger than to 1 byte. We could change that behavior with a flag if people like (-fstrict-alignment?).
I wonder what the motivation to make this change was as, historically, GCC has never tried to rescue the programmer in this clearly invalid case. Some obscure situation with SSE on x86? Do other compilers do the same, e.g. on ARM?
On Thu, 24 Sep 2015, ebotcazou at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67701 > > Eric Botcazou <ebotcazou at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |NEW > Last reconfirmed| |2015-09-24 > CC| |ebotcazou at gcc dot gnu.org > Ever confirmed|0 |1 > > --- Comment #5 from Eric Botcazou <ebotcazou at gcc dot gnu.org> --- > I wonder what the motivation to make this change was as, historically, GCC has > never tried to rescue the programmer in this clearly invalid case. Some > obscure situation with SSE on x86? Do other compilers do the same, e.g. on > ARM? Yes, AFAIK this was some obscure situation with SSE on x86. IIRC code doing unaligned scalar accesses (which is ok on x86) but then vectorized using peeling for alignment (which cannot succeed if the element is not naturally aligned) and segfaulting for the emitted aligned move instructions. Of course the pessimistic assumption we make only applies when the compiler sees the underlying decl, thus only a fraction of all possible cases are handled conservative that way. Maybe these days the legacy has been cleaned up enough so we can remove that conservative handling again... I think it also causes us to handle char c[4]; int main() { if (!((unsigned long)c & 3)) return *(int *)c; return c[0]; } too conservatively as we expand _5 = MEM[(int *)&c]; and thus lost the flow-sensitive info. Implementation-wise you'd have to adjust get_object_alignment_2 to get at a MEM_REF base (get_inner_reference will look through MEM_REFs with &decl operand 0). Eventually by adjusting get_inner_reference itself to avoid doing the work twice.
> Yes, AFAIK this was some obscure situation with SSE on x86. IIRC > code doing unaligned scalar accesses (which is ok on x86) but then > vectorized using peeling for alignment (which cannot succeed if the > element is not naturally aligned) and segfaulting for the emitted > aligned move instructions. I see, thanks for the insight. > Maybe these days the legacy has been cleaned up enough so we can > remove that conservative handling again... I think it also causes > us to handle > > char c[4]; > > int main() > { > if (!((unsigned long)c & 3)) > return *(int *)c; > return c[0]; > } > > too conservatively as we expand > > _5 = MEM[(int *)&c]; > > and thus lost the flow-sensitive info. The problem is that, in order to fix a legitimate issue on x86, the change pessimizes the code for strict-alignment platforms, where the said issue doesn't exist since there are unaligned accesses in the source code. And of course only for them, since x86 has unaligned load/stores. So, in the end, this is a net loss for strict-alignment platforms.
On Thu, 24 Sep 2015, ebotcazou at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67701 > > --- Comment #7 from Eric Botcazou <ebotcazou at gcc dot gnu.org> --- > > Yes, AFAIK this was some obscure situation with SSE on x86. IIRC > > code doing unaligned scalar accesses (which is ok on x86) but then > > vectorized using peeling for alignment (which cannot succeed if the > > element is not naturally aligned) and segfaulting for the emitted > > aligned move instructions. > > I see, thanks for the insight. > > > Maybe these days the legacy has been cleaned up enough so we can > > remove that conservative handling again... I think it also causes > > us to handle > > > > char c[4]; > > > > int main() > > { > > if (!((unsigned long)c & 3)) > > return *(int *)c; > > return c[0]; > > } > > > > too conservatively as we expand > > > > _5 = MEM[(int *)&c]; > > > > and thus lost the flow-sensitive info. > > The problem is that, in order to fix a legitimate issue on x86, the change > pessimizes the code for strict-alignment platforms, where the said issue > doesn't exist since there are unaligned accesses in the source code. And of > course only for them, since x86 has unaligned load/stores. So, in the end, > this is a net loss for strict-alignment platforms. Agreed. Looking at how to fix this in get_object_alignment_2 I wonder if it makes sense to unify this function with get_inner_reference. The other choice would be to add a flag to get_inner_reference to stop at MEM_REF/TARGET_MEM_REFs.
Note the issue should be partly mitigated by the fix for PR70424. The issue in comment#6 is still there as we still trust the alignment of an underlying decl more than the alignment info present on the MEM_REF.
The reason for the legacy on x86 targets is indeed not present on strict-alignment platforms if it is the case that on strict-alignment platforms unaligned accesses trap (and are not merely slow). Index: gcc/builtins.c =================================================================== --- gcc/builtins.c (revision 235005) +++ gcc/builtins.c (working copy) @@ -339,7 +339,8 @@ get_object_alignment_2 (tree exp, unsign Do so only if get_pointer_alignment_1 did not reveal absolute alignment knowledge and if using that alignment would improve the situation. */ - if (!addr_p && !known_alignment + if (!addr_p + && (!known_alignment || STRICT_ALIGNMENT) && TYPE_ALIGN (TREE_TYPE (exp)) > align) align = TYPE_ALIGN (TREE_TYPE (exp)); else