Consider these functions: void copy1(char* d, const char* s) { memcpy(d, s, 256); } void copy2(short* d, const short* s) { memcpy(d, s, 256); } void copy3(int* d, const int* s) { memcpy(d, s, 256); } void copy4(long* d, const long* s) { memcpy(d, s, 256); } g++-4.5.2 is able to generate better code for the later functions. But when I test with a recent snapshot (SVN revision 178875 on linux x86_64) it generates the same code for all versions (same as copy1()).
The middle-end does not know anything about the parameters alignment.
*** Bug 52415 has been marked as a duplicate of this bug. ***
I ran into this issue when trying out something on SH, too. (4.9 trunk rev 200571) Example: void test (const int* a, int* b) { std::memcpy (b, a, 4); } Here I'd have expected this to result in a int load + int store, like... mov.l @r4,r1 rts mov.l r1,@r5 However, this does not happen. The SH's movmemsi pattern gets a 'common alignment' of '1'. SH is a strict alignment target and thus the expander will fall back to memcpy library call. If I remember correctly, on strict alignment targets a pointer to some type is by default assumed to satisfy the minimum alignment of that type. For example int test (const int* a) { return a[0]; } resuls in (as expected): rts mov.l @r4,r0 In the memcpy case it should be safe to assume that the alignment of int* is 32 bit aligned. As it has been mentioned in PR 52415, I've also tried out __builtin_assume_aligned and the problem disappears. In this case the memcpy is eliminated early on and the movmemsi pattern is not used at all.
(In reply to Richard Biener from comment #1) > The middle-end does not know anything about the parameters alignment. But the following code will use SImode load / store, even on strict-alignment backends void test (const int* a, int* b) { *b = *a; } If SImode operations are in order here, why not in memcpy et al. provided it is feeded with int*? In the case where an unaligned pointer is used in *b = *a and this causes, e.g. a trap, this is in order. Similar should apply for memcpy on strict-alignment machines if it gets an unaligned int* for example and the size is a multiple of the alignment requirement. BTW: AVR is an 8-bit machine that copies in chunks of 1 byte, this applies also for 4.5 and older. avr-gcc is /not/ a strict alignment backend. Thul removed from the targets.
*** Bug 60737 has been marked as a duplicate of this bug. ***
*** Bug 67676 has been marked as a duplicate of this bug. ***
This seems to affect even the most trivial cases. I ran the following code witn arm gcc 4.8.4, 4.9.2 and 5.3.0: #include <stdint.h> #include <string.h> uint32_t foo_noalign(const uint32_t *s) { uint32_t v; memcpy(&v, s, sizeof(v)); return v; } uint32_t foo(const uint32_t *s) { uint32_t v; memcpy(&v, __builtin_assume_aligned(s, 4), sizeof(v)); return v; } Which generates the following code: 00000000 <foo_noalign>: 0: e92d4007 push {r0, r1, r2, lr} 4: e3a02004 mov r2, #4 8: e1a01000 mov r1, r0 c: e08d0002 add r0, sp, r2 10: ebfffffe bl 0 <memcpy> 14: e59d0004 ldr r0, [sp, #4] 18: e28dd00c add sp, sp, #12 1c: e49de004 pop {lr} ; (ldr lr, [sp], #4) 20: e12fff1e bx lr 00000024 <foo>: 24: e5900000 ldr r0, [r0] 28: e12fff1e bx lr Thats really, really bad. clang has no problems generating the optimal code.
On Fri, 8 Jul 2016, npl at chello dot at wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 > > npl at chello dot at changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |npl at chello dot at > > --- Comment #7 from npl at chello dot at --- > This seems to affect even the most trivial cases. I ran the following code witn > arm gcc 4.8.4, 4.9.2 and 5.3.0: > > #include <stdint.h> > #include <string.h> > > uint32_t foo_noalign(const uint32_t *s) { > uint32_t v; > memcpy(&v, s, sizeof(v)); > return v; > } > > uint32_t foo(const uint32_t *s) { > uint32_t v; > memcpy(&v, __builtin_assume_aligned(s, 4), sizeof(v)); > return v; > } > > Which generates the following code: > > 00000000 <foo_noalign>: > 0: e92d4007 push {r0, r1, r2, lr} > 4: e3a02004 mov r2, #4 > 8: e1a01000 mov r1, r0 > c: e08d0002 add r0, sp, r2 > 10: ebfffffe bl 0 <memcpy> > 14: e59d0004 ldr r0, [sp, #4] > 18: e28dd00c add sp, sp, #12 > 1c: e49de004 pop {lr} ; (ldr lr, [sp], #4) > 20: e12fff1e bx lr > > 00000024 <foo>: > 24: e5900000 ldr r0, [r0] > 28: e12fff1e bx lr > > Thats really, really bad. clang has no problems generating the optimal code. But there is no good reasoning that can be applied that it is a valid transform. What does clang do when s is void * and you cast that to uint32_t *? Note that memcpy prototype takes a void * argument.
(In reply to rguenther@suse.de from comment #8) > > But there is no good reasoning that can be applied that it is a valid > transform. What does clang do when s is void * and you cast that to > uint32_t *? Note that memcpy prototype takes a void * argument. Does it matter whether there's a void* in between or not? If we do a *(int32_T*)(void*)(int32_t*)p on a strict alignment target, it will result in a 32 bit mem access. GCC knows memcpy and its semantics well enough so propagating the target's alignment rules into memcpy is most likely going to be a safe and sane thing to do. So at least cases like void test (const int* a, int* b) { std::memcpy (b, a, 4); } should work. Shouldn't it?
On Fri, 8 Jul 2016, olegendo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 > > --- Comment #9 from Oleg Endo <olegendo at gcc dot gnu.org> --- > (In reply to rguenther@suse.de from comment #8) > > > > But there is no good reasoning that can be applied that it is a valid > > transform. What does clang do when s is void * and you cast that to > > uint32_t *? Note that memcpy prototype takes a void * argument. > > Does it matter whether there's a void* in between or not? > > If we do a *(int32_T*)(void*)(int32_t*)p on a strict alignment target, it will > result in a 32 bit mem access. No, changing the pointer type doesn't change the access - how the access is implemented depends on memcpy. > GCC knows memcpy and its semantics well enough so propagating the target's > alignment rules into memcpy is most likely going to be a safe and sane thing to > do. So at least cases like > > void test (const int* a, int* b) > { > std::memcpy (b, a, 4); > } > > should work. Shouldn't it? What makes the parameter type special? Would void test (const int *a, int *b) { std::memcpy ((char *)b, (char *)a, t); } be invalid to optimize to an aligned copy in your eyes? Yes, I know that C standard rule about pointer type casting, 6.3.2.3/7 in C11, but I guess that if we'd start to derive alignment from this undefinedness we are going to break a lot of code. To GCC what matters is actual accesses, not just pointer conversions which are all stripped anyway which makes the following testcase not optimizable (easily) void test (const char *a, char *b) { std::memcpy ((int *)b, (int *)a, 4); }
(In reply to rguenther@suse.de from comment #10) > > What makes the parameter type special? Would > > void test (const int *a, int *b) > { > std::memcpy ((char *)b, (char *)a, t); > } > > be invalid to optimize to an aligned copy in your eyes? I think it's perfectly valid to optimize this case. It could as well be something like: void test (const int *a, int *b) { a[100] = 1; b[200] = 2; std::memcpy ((char *)b, (char *)a, t); } where a[100] and b[200] both would result in 32 bit accesses, not 4x1 byte or something, because the base pointer is assumed to be int aligned. Why should memcpy be any different?
On Fri, 8 Jul 2016, olegendo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 > > --- Comment #11 from Oleg Endo <olegendo at gcc dot gnu.org> --- > (In reply to rguenther@suse.de from comment #10) > > > > What makes the parameter type special? Would > > > > void test (const int *a, int *b) > > { > > std::memcpy ((char *)b, (char *)a, t); > > } > > > > be invalid to optimize to an aligned copy in your eyes? > > I think it's perfectly valid to optimize this case. It could as well be > something like: > > void test (const int *a, int *b) > { > a[100] = 1; > b[200] = 2; > > std::memcpy ((char *)b, (char *)a, t); > } > > where a[100] and b[200] both would result in 32 bit accesses, not 4x1 byte or > something, because the base pointer is assumed to be int aligned. No, because the access is performed as 'int'. > Why should memcpy be any different? Because the memcpy stmt doesn't constitute a memory access but a function call.
(In reply to rguenther@suse.de from comment #12) > > No, because the access is performed as 'int'. > > > Why should memcpy be any different? > > Because the memcpy stmt doesn't constitute a memory access but a function > call. Yeah, that might be how things are handled internally. But that doesn't mean the optimization in question can't / shouldn't be done. GCC does even more "evil" things with function calls such as replacing printf with puts, propagating arguments into functions, inlining... So ... uhm ... I'm afraid I don't get the point.
On Fri, 8 Jul 2016, olegendo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 > > --- Comment #13 from Oleg Endo <olegendo at gcc dot gnu.org> --- > (In reply to rguenther@suse.de from comment #12) > > > > No, because the access is performed as 'int'. > > > > > > Why should memcpy be any different? > > > > Because the memcpy stmt doesn't constitute a memory access but a function > > call. > > Yeah, that might be how things are handled internally. But that doesn't mean > the optimization in question can't / shouldn't be done. GCC does even more > "evil" things with function calls such as replacing printf with puts, > propagating arguments into functions, inlining... So ... uhm ... I'm afraid I > don't get the point. The point is the only reason you can do the optimization (on all targets) is that C standard rule about pointer types (casting). GCCs implementation details will make the application of the optimization quite unreliable as well. I've repeatedly said that I think relying on that part of the C standard with respect to undefinedness would be a bad thing from a QOI perspective. But I've changed my mind in the past - does -fsanitize=alignment instrument pointer types/casts?
Doesn't sanitize anything for int main() { int i, j; memcpy (&i, (int *)((char *)&i + 1), 4); return 0; } but note that the cast to (int *) is already removed during early folding and you end up with GENERIC memcpy ((void *) &i, (const void *) (&i + 1), 4);
rguenther: Funny enough, I am using memcpy because thats the only standard conform way to move data around that might be aliased. And I use this often to parse binary streams from a network. I think you are missing some key points, first that gcc readily replaces functions it knows - which can be disabled at will. On x86, for all those samples the call WILL BE replaced. (And in C++, RVO is explicitly allowed for example) So its pointless to discuss whether or not gcc should provide builtins (atleast in the context of this bug). Instead talk about the replacing feature that seems to work inconsistent / suboptimal. One more sample to demonstrate this (all compiled with O2): #include <string.h> int foo_ptr(const int *s) { /* nope */ int v; memcpy(&v, s, sizeof(v)); return v; } int foo_ref(const int &s) { /* nope */ int v; memcpy(&v, &s, sizeof(v)); return v; } int foo_refloc(const int &s) { /* This actually works */ int b = s; int v; memcpy(&v, &b, sizeof(v)); return v; } int foo_badalias(const char *s) { /* Nope, but then THIS shouldnt be be replaced by a aligned load */ int v; memcpy(&v, (int *)s, sizeof(v)); return v; } int foo_badalias2(const char *s) { /* That works, but is undefined behaviour (casting from a pointer with less alignment)! */ int v = *(int *)s; return v; }
I got interrupted by a colleague at work, part 2 of the ramblings... Everything you could argue against memcpy beeing replaced by simpler instructions, doesnt change that the same issue persists with the __builtin_memcpy function, which is explicitely saying you want the optimizations. A pointer to a uint32 can be assumed to be proper aligned, CREATING such a pointer thats not aligned is already undefined behaviour by the standard (the compiler could zero out bits for example). I dont think that what happens afterwards with something that shouldn`t exist in the first place is an argument against optimizing proper code. Further, I lack a consistent way of dealing with potential aliasing pointers. Using memcpy seems the sanest way, simply because its standards compliant, supported everywhere and your code wont mysteriously break once you use LTO or higher optimization settings. Compilers can reliably detect this and replace memcpy since years (ignoring this issue, which I would consider a bug), so there is no draw back. Its a feature common pretty much everywhere, and a valid recommendation in many discussions related to the topic. Consider the example below for illustration, FIXEDMEMCPY is how the plain memcpy should work and already does work for archs with unaligned access. (I had planned to post the code for 32bit x86, but the assembly is rather ugly, amd64 would work with "unsigned long" and "unsigned long long"). I already ran in such issues, when different software components define their own fixedwidth types. Its a practical issue where pointing to paragraphs of the standard dont help, unless you provide a proper solution with it. The FIXEDMEMCPY hack is fine for gcc but compilerspecific. In short: * Optimizing memcpy to simple instructions is a reality and expected, the behaviour (slow code) on arm (and other archs with req. alignment) is a unwelcome oddity * memcpy is one of the few ways to deal with aliasing, and the most standards compliant. (theres unions too, but thats not standards compliant) * I dont see a problem in replacing standard functions (and __builtin_memcpy has the same issue) * I dont see a problem in expecting a correctly aligned pointer, and doing undefined behaviour if the pointer could cause undefined behaviour. typedef unsigned uint32_t; typedef unsigned long uint32_alt; _Static_assert(sizeof(uint32_t) == sizeof(uint32_alt), "you picked a bad architecture or typedefs for this example"); #define FIXEDMEMCPY(a, b, s) __builtin_memcpy(__builtin_assume_aligned(a, __alignof__(*a)), __builtin_assume_aligned(b, __alignof__(*b)), s) unsigned breakme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a) { /* normally in different compilation units, but LTO doesnt care */ *ptr = 0; *ptr2 = a; return *ptr; } unsigned fixme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a) { /* fixes aliasing, but should be as fast as simple accesses */ uint32_t val = 0; FIXEDMEMCPY(ptr, &val, 4); FIXEDMEMCPY(ptr2 , &a, 4); uint32_t val2; FIXEDMEMCPY(&val2, ptr, 4); return val2; } 00000000 <breakme>: 0: e3a03000 mov r3, #0 4: e5803000 str r3, [r0] 8: e1a00003 mov r0, r3 // Oops: retval = 0 c: e5812000 str r2, [r1] 10: e12fff1e bx lr 00000014 <fixme>: 14: e3a03000 mov r3, #0 18: e5803000 str r3, [r0] 1c: e5812000 str r2, [r1] 20: e5900000 ldr r0, [r0] // The load thats missing above 24: e24dd010 sub sp, sp, #16 // Time for another 28: e28dd010 add sp, sp, #16 // Bugreport ? 2c: e12fff1e bx lr
(In reply to rguenther@suse.de from comment #12) > On Fri, 8 Jul 2016, olegendo at gcc dot gnu.org wrote: > > > void test (const int *a, int *b) > > { > > a[100] = 1; > > b[200] = 2; > > > > std::memcpy ((char *)b, (char *)a, t); > > } > > > > where a[100] and b[200] both would result in 32 bit accesses, not 4x1 byte or > > something, because the base pointer is assumed to be int aligned. > > No, because the access is performed as 'int'. > > > Why should memcpy be any different? > > Because the memcpy stmt doesn't constitute a memory access but a function > call. What about a new command option like -fassume-aligned-xxx that's off per default? The user could assert that when she is using memcpy (and friends) with a pointer of a specific type, then that also asserts that the data behind the pointer is appropriately aligned and may be accessed accordingly.
(In reply to npl from comment #17) > I got interrupted by a colleague at work, part 2 of the ramblings... > > Everything you could argue against memcpy beeing replaced by simpler > instructions, doesnt change that the same issue persists with the > __builtin_memcpy function, which is explicitely saying you want the > optimizations. > > A pointer to a uint32 can be assumed to be proper aligned, CREATING such a > pointer thats not aligned is already undefined behaviour by the standard > (the compiler could zero out bits for example). I dont think that what > happens afterwards with something that shouldn`t exist in the first place is > an argument against optimizing proper code. > > Further, I lack a consistent way of dealing with potential aliasing > pointers. Using memcpy seems the sanest way, simply because its standards > compliant, supported everywhere and your code wont mysteriously break once > you use LTO or higher optimization settings. > Compilers can reliably detect this and replace memcpy since years (ignoring > this issue, which I would consider a bug), so there is no draw back. Its a > feature common pretty much everywhere, and a valid recommendation in many > discussions related to the topic. > > Consider the example below for illustration, FIXEDMEMCPY is how the plain > memcpy should work and already does work for archs with unaligned access. > (I had planned to post the code for 32bit x86, but the assembly is rather > ugly, amd64 would work with "unsigned long" and "unsigned long long"). > > I already ran in such issues, when different software components define > their own fixedwidth types. Its a practical issue where pointing to > paragraphs of the standard dont help, unless you provide a proper solution > with it. The FIXEDMEMCPY hack is fine for gcc but compilerspecific. > > In short: > * Optimizing memcpy to simple instructions is a reality and expected, the > behaviour (slow code) on arm (and other archs with req. alignment) is a > unwelcome oddity > * memcpy is one of the few ways to deal with aliasing, and the most > standards compliant. (theres unions too, but thats not standards compliant) > * I dont see a problem in replacing standard functions (and __builtin_memcpy > has the same issue) > * I dont see a problem in expecting a correctly aligned pointer, and doing > undefined behaviour if the pointer could cause undefined behaviour. > > > > typedef unsigned uint32_t; > typedef unsigned long uint32_alt; > _Static_assert(sizeof(uint32_t) == sizeof(uint32_alt), "you picked a bad > architecture or typedefs for this example"); > > #define FIXEDMEMCPY(a, b, s) __builtin_memcpy(__builtin_assume_aligned(a, > __alignof__(*a)), __builtin_assume_aligned(b, __alignof__(*b)), s) > unsigned breakme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a) > { > /* normally in different compilation units, but LTO doesnt care */ > *ptr = 0; > *ptr2 = a; > return *ptr; > } > > unsigned fixme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a) > { > /* fixes aliasing, but should be as fast as simple accesses */ > uint32_t val = 0; > FIXEDMEMCPY(ptr, &val, 4); > FIXEDMEMCPY(ptr2 , &a, 4); > uint32_t val2; > FIXEDMEMCPY(&val2, ptr, 4); > return val2; > } > > 00000000 <breakme>: > 0: e3a03000 mov r3, #0 > 4: e5803000 str r3, [r0] > 8: e1a00003 mov r0, r3 // Oops: retval = 0 > c: e5812000 str r2, [r1] > 10: e12fff1e bx lr > > 00000014 <fixme>: > 14: e3a03000 mov r3, #0 > 18: e5803000 str r3, [r0] > 1c: e5812000 str r2, [r1] > 20: e5900000 ldr r0, [r0] // The load thats missing above > 24: e24dd010 sub sp, sp, #16 // Time for another > 28: e28dd010 add sp, sp, #16 // Bugreport ? > 2c: e12fff1e bx lr It's not done on STRICT_ALIGNMENT platforms because not all of those expand misaligned moves correctly (IIRC). Looking at RTL expansion at least the misaligned destination will work correctly. The question remains is what happens for -Os and for example both misaligned source and destination. Or on x86 where a simple rep; movb; is possible (plus the register setup of course).
Created attachment 38879 [details] patch to remove memcpy inline expansion misalign limit This patch lifts the restrictions on memcpy to assignment folding WRT strict alignment platforms. As said for -Os this might not be optimal (but it's usually hard to tell given the cases we want the early folding there are followup optimization opportunities like promoting either source or destination to a register).
(In reply to Georg-Johann Lay from comment #18) > (In reply to rguenther@suse.de from comment #12) > > On Fri, 8 Jul 2016, olegendo at gcc dot gnu.org wrote: > > > > > void test (const int *a, int *b) > > > { > > > a[100] = 1; > > > b[200] = 2; > > > > > > std::memcpy ((char *)b, (char *)a, t); > > > } > > > > > > where a[100] and b[200] both would result in 32 bit accesses, not 4x1 byte or > > > something, because the base pointer is assumed to be int aligned. > > > > No, because the access is performed as 'int'. > > > > > Why should memcpy be any different? > > > > Because the memcpy stmt doesn't constitute a memory access but a function > > call. > > What about a new command option like -fassume-aligned-xxx that's off per > default? > > The user could assert that when she is using memcpy (and friends) with a > pointer of a specific type, then that also asserts that the data behind the > pointer is appropriately aligned and may be accessed accordingly. But if you do void copy (int *d, int *s) { memcpy ((char *)d, (char *)s, 4); } then you will get aligned accesses because all the middle-end sees is mempcy (d, s, 4); so as I said elsewhere the only way to reliably implement deriving alignment from pointer types is by the frontends inserting __builtin_assume_aligned calls before they possibly stripped any conversions. Yes, if we simply say we strictly follow C11 6.3.2.3/7 we can do that using a simple flag. But we won't get the optimization reliably because of the above issue for the variant with char * parameters and int * casts. You could call that flag -fstrict-alignment (though maybe that would be confusing to people familiar with GCC internals STRICT_ALIGNMENT target macro). A simple implementation could be in get_pointer_alignment_1, wrapping it with a function that inspects TYPE_ALIGN (TREE_TYPE (TREE_TYPE (exp)) and uses that if it trumps the wrapped fn return values. But I expect much undesired fallout from such a change. Having a flag restricting the optimization behavior to memcpy-and-friends is not going to work (without the frontends inserting extra IL when all information is still present).
(In reply to Richard Biener from comment #21) > (In reply to Georg-Johann Lay from comment #18) > > (In reply to rguenther@suse.de from comment #12) > > > On Fri, 8 Jul 2016, olegendo at gcc dot gnu.org wrote: > > > > > > > void test (const int *a, int *b) > > > > { > > > > a[100] = 1; > > > > b[200] = 2; > > > > > > > > std::memcpy ((char *)b, (char *)a, t); > > > > } > > > > > > > > where a[100] and b[200] both would result in 32 bit accesses, not 4x1 byte or > > > > something, because the base pointer is assumed to be int aligned. > > > > > > No, because the access is performed as 'int'. > > > > > > > Why should memcpy be any different? > > > > > > Because the memcpy stmt doesn't constitute a memory access but a function > > > call. > > > > What about a new command option like -fassume-aligned-xxx that's off per > > default? > > > > The user could assert that when she is using memcpy (and friends) with a > > pointer of a specific type, then that also asserts that the data behind the > > pointer is appropriately aligned and may be accessed accordingly. > > But if you do > > void copy (int *d, int *s) > { > memcpy ((char *)d, (char *)s, 4); > } > > then you will get aligned accesses because all the middle-end sees is > > mempcy (d, s, 4); Same thing happens already if you do this: int d, s; mempcy ((char *)&d, (char *)&s, 4); Its also generally quite hard to force the compiler to do less-aligned accesses, and I haven`t seen this "solution" anywhere. (Probably because it doesn`t work on any current compiler) > so as I said elsewhere the only way to reliably implement deriving alignment > from pointer types is by the frontends inserting __builtin_assume_aligned > calls before they possibly stripped any conversions. > > Yes, if we simply say we strictly follow C11 6.3.2.3/7 we can do that > using a simple flag. But we won't get the optimization reliably because > of the above issue for the variant with char * parameters and int * casts. And you shouldn`t get the optimization in this case. Casting char* to int* is non-standard, the assume_aligned builtin is a good fit for that non-standard stuff IMHO. > You could call that flag -fstrict-alignment (though maybe that would be > confusing to people familiar with GCC internals STRICT_ALIGNMENT target > macro). A simple implementation could be in get_pointer_alignment_1, > wrapping it with a function that inspects TYPE_ALIGN (TREE_TYPE (TREE_TYPE > (exp)) and uses that if it trumps the wrapped fn return values. But I > expect much undesired fallout from such a change. The internals are above my head, but in regards to fallout the same thing could (has) be said for -fstrict-aliasing. Im a victim myself, and am suffering from a paranoia where I have to replace pointer-accesses with memcpy =) (In reply to Richard Biener from comment #19) > (In reply to npl from comment #17) > > I got interrupted by a colleague at work, part 2 of the ramblings... > > > > Everything you could argue against memcpy beeing replaced by simpler > > instructions, doesnt change that the same issue persists with the > > __builtin_memcpy function, which is explicitely saying you want the > > optimizations. > > > > A pointer to a uint32 can be assumed to be proper aligned, CREATING such a > > pointer thats not aligned is already undefined behaviour by the standard > > (the compiler could zero out bits for example). I dont think that what > > happens afterwards with something that shouldn`t exist in the first place is > > an argument against optimizing proper code. > > > > Further, I lack a consistent way of dealing with potential aliasing > > pointers. Using memcpy seems the sanest way, simply because its standards > > compliant, supported everywhere and your code wont mysteriously break once > > you use LTO or higher optimization settings. > > Compilers can reliably detect this and replace memcpy since years (ignoring > > this issue, which I would consider a bug), so there is no draw back. Its a > > feature common pretty much everywhere, and a valid recommendation in many > > discussions related to the topic. > > > > Consider the example below for illustration, FIXEDMEMCPY is how the plain > > memcpy should work and already does work for archs with unaligned access. > > (I had planned to post the code for 32bit x86, but the assembly is rather > > ugly, amd64 would work with "unsigned long" and "unsigned long long"). > > > > I already ran in such issues, when different software components define > > their own fixedwidth types. Its a practical issue where pointing to > > paragraphs of the standard dont help, unless you provide a proper solution > > with it. The FIXEDMEMCPY hack is fine for gcc but compilerspecific. > > > > In short: > > * Optimizing memcpy to simple instructions is a reality and expected, the > > behaviour (slow code) on arm (and other archs with req. alignment) is a > > unwelcome oddity > > * memcpy is one of the few ways to deal with aliasing, and the most > > standards compliant. (theres unions too, but thats not standards compliant) > > * I dont see a problem in replacing standard functions (and __builtin_memcpy > > has the same issue) > > * I dont see a problem in expecting a correctly aligned pointer, and doing > > undefined behaviour if the pointer could cause undefined behaviour. > > > > > > > > typedef unsigned uint32_t; > > typedef unsigned long uint32_alt; > > _Static_assert(sizeof(uint32_t) == sizeof(uint32_alt), "you picked a bad > > architecture or typedefs for this example"); > > > > #define FIXEDMEMCPY(a, b, s) __builtin_memcpy(__builtin_assume_aligned(a, > > __alignof__(*a)), __builtin_assume_aligned(b, __alignof__(*b)), s) > > unsigned breakme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a) > > { > > /* normally in different compilation units, but LTO doesnt care */ > > *ptr = 0; > > *ptr2 = a; > > return *ptr; > > } > > > > unsigned fixme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a) > > { > > /* fixes aliasing, but should be as fast as simple accesses */ > > uint32_t val = 0; > > FIXEDMEMCPY(ptr, &val, 4); > > FIXEDMEMCPY(ptr2 , &a, 4); > > uint32_t val2; > > FIXEDMEMCPY(&val2, ptr, 4); > > return val2; > > } > > > > 00000000 <breakme>: > > 0: e3a03000 mov r3, #0 > > 4: e5803000 str r3, [r0] > > 8: e1a00003 mov r0, r3 // Oops: retval = 0 > > c: e5812000 str r2, [r1] > > 10: e12fff1e bx lr > > > > 00000014 <fixme>: > > 14: e3a03000 mov r3, #0 > > 18: e5803000 str r3, [r0] > > 1c: e5812000 str r2, [r1] > > 20: e5900000 ldr r0, [r0] // The load thats missing above > > 24: e24dd010 sub sp, sp, #16 // Time for another > > 28: e28dd010 add sp, sp, #16 // Bugreport ? > > 2c: e12fff1e bx lr > > It's not done on STRICT_ALIGNMENT platforms because not all of those expand > misaligned moves correctly (IIRC). Looking at RTL expansion at least the > misaligned destination will work correctly. The question remains is what > happens for -Os and for example both misaligned source and destination. > Or on x86 where a simple rep; movb; is possible (plus the register setup > of course). Not sure what you mean, x86 has unaligned accesses and shouldn't be affected. Also, I doubt there are many cases where the function-call (and the register and stack shuffling) will use less code than the aligned access. The generated code for unaligned access could be improved in many cases (ARM atleast), possibly fixed. But thats not generally an argument against improving the builtins?
On Tue, 12 Jul 2016, npl at chello dot at wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 > > --- Comment #22 from npl at chello dot at --- > > > 00000014 <fixme>: > > > 14: e3a03000 mov r3, #0 > > > 18: e5803000 str r3, [r0] > > > 1c: e5812000 str r2, [r1] > > > 20: e5900000 ldr r0, [r0] // The load thats missing above > > > 24: e24dd010 sub sp, sp, #16 // Time for another > > > 28: e28dd010 add sp, sp, #16 // Bugreport ? > > > 2c: e12fff1e bx lr > > > > It's not done on STRICT_ALIGNMENT platforms because not all of those expand > > misaligned moves correctly (IIRC). Looking at RTL expansion at least the > > misaligned destination will work correctly. The question remains is what > > happens for -Os and for example both misaligned source and destination. > > Or on x86 where a simple rep; movb; is possible (plus the register setup > > of course). > > Not sure what you mean, x86 has unaligned accesses and shouldn't be affected. > Also, I doubt there are many cases where the function-call (and the register > and stack shuffling) will use less code than the aligned access. > > The generated code for unaligned access could be improved in many cases (ARM > atleast), possibly fixed. But thats not generally an argument against improving > the builtins? Sure. It is just that I am usually (overly?) cautionous when introducing unaligned accesses because historically they were handled wrong and nowadays they might be handled very inefficient on strict-align targets. I am going to propose the patch and add a testcase with most misalign cases so we can see runtime fallout on the more weird targets.
(In reply to npl from comment #22) > > Its also generally quite hard to force the compiler to do less-aligned > accesses, and I haven`t seen this "solution" anywhere. (Probably because it > doesn`t work on any current compiler) Hm, have you tried using the address of a member in a packed (1) struct?
Yes, that works fine. I just meant to say it needs more work than casting to a type with less alignment, and unless explicitly marked with this pragma you can expect a compiler to access like the (deduced) type was properly aligned.
On Tue, 12 Jul 2016, npl at chello dot at wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 > > --- Comment #25 from npl at chello dot at --- > Yes, that works fine. I just meant to say it needs more work than casting to a > type with less alignment, and unless explicitly marked with this pragma you can > expect a compiler to access like the (deduced) type was properly aligned. sth like typedef int acc __attribute__((may_alias, aligned(1))); and an actual access like *(acc *)p; works as well (type punning allowed via may_alias, alignment adjusted to 1 byte). Of course that's not portable. Portable is to use memcpy which irrespective of the type of the pointer has to assume an unaligned access.
GCC 4.9 branch is being closed
GCC 5 branch is being closed
GCC 6 branch is being closed
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
GCC 8.4.0 has been released, adjusting target milestone.
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
*** Bug 101920 has been marked as a duplicate of this bug. ***
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
GCC 11 branch is being closed.