Created attachment 23827 [details] tarball containing preprocessed output, valgrind warnings, and object files With GCC 4.6.0, I get a crash, early in the startup of our system, when compiling the system at –O3. This previously worked fine with GCC 4.5 and 4.4.5. Re-compiling only the top-most .cpp file in the crash's call stack with –O2, and re-linking, fixes the crash. In the attached archive: -console_urls.i, in which Console_url_table::Add_url_to_category() calls url_map->Put(), which ends up calling Hashcode() and crashes on a dereference of the 'wordp' variable (*wordp++ & 0xdfdfdfdf). gdb claims wordp has been optimized out when we debug the crash. -vg-*, which is the valgrind (3.6.1, using the ubuntu natty package) output at each optimization level. I have been told that some of the warnings are false positives, but the PPL and GMP ones I have never seen before. (For what it's worth, I ran valgrind on 4.5.3 compiling the same file, and it had nothing to report at all in GCC or related libraries.) -build-cross-4.6.sh, which is the script I used to generate the toolchain and contains package versions, configure commandlines, and build parameters for GCC and all the supporting libraries. I tried downgrading a few of the libraries to versions supplied in the sourceryg++ tarball, but it didn’t appear to fix the issue. The 4.5.3 I tested (which did not have this bug) was compiled with a similar script, and with all the same library versions. My commandline: g++ -Wa,-adlh=build_products/shared/x86_64/SOGS_native/release/gcc_v4.6.0/console_urls.s -pipe -m64 -minline-all-stringops -fvisibility-inlines-hidden -ggdb -g3 -fpic -nostdinc -ftree-loop-linear -funroll-loops -march=opteron -Wa,-mtune=opteron -Wchar-subscripts -Wreturn-type -Wunknown-pragmas -Wno-int-to-pointer-cast -Wformat -Wformat-nonliteral -Wformat-security -Wpointer-arith -Wcast-align -Wwrite-strings -Wsign-compare -Wunused-variable -Wunused-parameter -Wparentheses -fasynchronous-unwind-tables -fno-enforce-eh-specs -fcheck-new -O3 -fno-omit-frame-pointer -fno-strict-aliasing -fno-builtin-abort -fno-builtin-calloc -fno-builtin-exit -fno-builtin-free -fno-builtin-malloc -fno-builtin-realloc -fvisibility=default -fno-threadsafe-statics -Werror -I/home/matt/src/clueboat/porkius/main/src/include/libc -I/home/matt/src/clueboat/porkius/main/src/include/os -I/home/matt/src/clueboat/porkius/main/src/include/hal/x86_64 -I/home/matt/src/clueboat/porkius/main/src/include/hal -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/include/legacy -I/home/matt/src/clueboat/porkius/main/src/network/stack/sys -I/home/matt/src/clueboat/porkius/main/src/network/stack/local -I/home/matt/src/clueboat/porkius/main/src/network/stack/include -I../../monitoring -I/home/matt/src/clueboat/porkius/main/objects/SOGS_x86_64_release/autogen/management/ami/SOGS/dml/include -I/home/matt/src/clueboat/porkius/main/src/management/ami/SOGS/include -I/home/matt/src/clueboat/porkius/main/src/management/ami/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/ami/dml/include -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0 -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0/x86_64-iscb-SOGS -DOPENSSL_FIPS -DSOGS_ANY -DSOGS_NATIVE -DSOGS_porkius -DSOGS_TARGET_NAME=x86_64 -DSOGS_SRC_ROOT="\"/home/matt/src/clueboat/porkius/main/src/util/shared/\"" -c -O3 /home/matt/src/clueboat/porkius/main/src/util/shared/console_urls.cpp -o /tmp/console_urls-O3.o If there’s anything else you need from me to help debug/resolve this, please let me know.
Does the issue reproduce without using -ftree-loop-linear? In general try to avoid loads of -f switches and stick to basic (and well tested) -Ox options. Does it fail with just -O3?
(In reply to comment #1) > Does the issue reproduce without using -ftree-loop-linear? In general try > to avoid loads of -f switches and stick to basic (and well tested) -Ox > options. Does it fail with just -O3? Yes, it appears to fail in the same way with all of the -ftree-* flags removed: SOGS-g++ -pipe -m64 -minline-all-stringops -fvisibility-inlines-hidden -ggdb -g3 -c -fpic -nostdinc -march=amdfam10 -Wa,-mtune=amdfam10 -Wchar-subscripts -Wreturn-type -Wunknown-pragmas -Wno-int-to-pointer-cast -Wformat -Wformat-nonliteral -Wformat-security -Wpointer-arith -Wcast-align -Wwrite-strings -Wsign-compare -Wunused-variable -Wunused-parameter -Wparentheses -fasynchronous-unwind-tables -fno-enforce-eh-specs -fcheck-new -O3 -fno-omit-frame-pointer -fno-strict-aliasing -fno-builtin-abort -fno-builtin-calloc -fno-builtin-exit -fno-builtin-free -fno-builtin-malloc -fno-builtin-realloc -fvisibility=hidden -fno-threadsafe-statics -Werror -I/home/matt/src/clueboat/porkius/main/src/include/libc -I/home/matt/src/clueboat/porkius/main/src/include/os -I/home/matt/src/clueboat/porkius/main/src/include/hal/x86_64 -I/home/matt/src/clueboat/porkius/main/src/include/hal -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I. -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/include/legacy -I/home/matt/src/clueboat/porkius/main/src/network/stack/sys -I/home/matt/src/clueboat/porkius/main/src/network/stack/local -I/home/matt/src/clueboat/porkius/main/src/network/stack/include -I/home/matt/src/clueboat/porkius/main/src/standard/include -I/home/matt/src/clueboat/porkius/main/src/include/legacy/cfssl -I/home/matt/src/clueboat/porkius/main/src/include/legacy/openssl -I/home/matt/src/clueboat/porkius/main/src/include/os/core/platform -I/home/matt/src/clueboat/porkius/main/src/proxy/ssl/ssl_proxy -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0 -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0/x86_64-iscb-SOGS -DCFSSL_PROXY -DOPENSSL_FIPS -DSOGS_ANY -DSOGS_NATIVE -DSOGS_porkius -DSOGS_TARGET_NAME=x86_64 -D_CRTBLD -DSOGS_SRC_ROOT="\"/home/matt/src/clueboat/porkius/main/src/proxy/ssl/\"" -o build_products/shared/x86_64/SOGS_native/release/gcc_v4.6.0/cli.o /home/matt/src/clueboat/porkius/main/src/proxy/ssl/cli.cpp I do generally agree with you on not employing too many individual optimization flags. We did a lot of testing with GCC 4.4.x and found this set of flags produced optimal system performance for our codebase. If plain -O3 in 4.6.0 provides superior performance, then I have an argument for simplifying thing. If plain -O3 is worse than 4.4.x, the product performance would regress and I'll have no legitimate argument ;) Either way, it used to work fine, so it's still a regression. I have attached the object file output compiled with the above commandline.
Created attachment 23845 [details] file with regression compiled with same flags, aside from -ftree-*
Anyway, please read http://gcc.gnu.org/bugs.html, there is nothing we can do from the information you've provided. Trying to find from a vague description of where you see the crash a difference in assembly is very hard, plus it may very well be just a bug in the sources you are compiling. If -O2 works and -O3 doesn't, try to add __attribute__((noinline)) resp. __attribute__((optimize(2))) resp. __attribute__((optimize(3))) to various suspect routines in the file to try to narrow it down to a single problematic routine (perhaps with some functions inlined into it), which will work fine if the whole file is compiled with -O3 just that routine is -O2 (and perhaps functions inlined into it) and will misbehave if everything is compled with -O2 just that function is compiled with -O3. Then try to make a self-contained executable testcase out of it (just call that routine from some other CU's main, perhaps with the minimal necessary setup for it and stub everything it calls. For (suspected) miscompilation we really need to be able to reproduce it ourselves, preferrably with a minimal testcase, otherwise we can't do anything on it unless you debug the problem yourself down to pointing a bug somewhere exact in the assembly.
(In reply to comment #4) > Anyway, please read http://gcc.gnu.org/bugs.html, there is nothing we can do > from the information you've provided. Trying to find from a vague description > of where you see the crash a difference in assembly is very hard, plus it may > very well be just a bug in the sources you are compiling. If -O2 works and -O3 > doesn't, try to add __attribute__((noinline)) resp. > __attribute__((optimize(2))) > resp. __attribute__((optimize(3))) to various suspect routines in the file to > try to narrow it down to a single problematic routine (perhaps with some > functions inlined into it), which will work fine if the whole file is compiled > with -O3 just that routine is -O2 (and perhaps functions inlined into it) and > will misbehave if everything is compled with -O2 just that function is compiled > with -O3. Then try to make a self-contained executable testcase out of it > (just call that routine from some other CU's main, perhaps with the minimal > necessary setup for it and stub everything it calls. For (suspected) > miscompilation we really need to be able to reproduce it ourselves, preferrably > with a minimal testcase, otherwise we can't do anything on it unless you debug > the problem yourself down to pointing a bug somewhere exact in the assembly. I've got a greatly reduced testcase that I'm continuing to pare down. I'll attach the self-contained example tomorrow.
in the 46bug/ dir in the attached tarball: g++ -O3 -g -I. -c te_field_id.cpp && g++ *.o -o test induces the crash: ==31412== General Protection Fault ==31412== at 0x400F38: TE_Field_id::Initialize_token_maps() (hashmap.hpp:710) ==31412== by 0x4017A8: main (te_main.cpp:5) This is the same crash I described before. Compiling with -O2 eliminates the crash. In te_field_id.hpp, there is a commented out __attribute__((optimize(2))). In the non-reduced case, adding that annotation just moves the crash around. I may file a separate bug for that, depending on the outcome of this bug.
Created attachment 23887 [details] reduced test case, commandline to compile in bug comments
In hashmap.hpp static inline uint32_t Hashcode( const char* s, uint32_t len, bool case_insensitive) { // prepare to run through string in 4-byte chunks const uint32_t* wordp = (const uint32_t*)s; bug: wordp can be incorrectly aligned.
Yeah, and with -O3 you are requesting vectorization. When that loop gets vectorized, gcc counts on that you don't lie about alignment. You should be using either typedef uint32_t unaligned_uint32 __attribute__((__aligned__ (1), __may_alias__))); and use that type for wordp, or just use may_alias attribute and first do a few byte cycles to align the pointer, then do the main cycle, then do any remaining bytes.
I do see the alignment problem you point out (though I'm disappointed that neither PC-Lint nor GCC's warnings alerted me). I made the changes you proposed, but still get the same crash: typedef uint32_t unaligned_uint32_t __attribute__((__aligned__ (1),__may_alias__)); // ... static inline uint32_t Hashcode( const char* s, uint32_t len, bool case_insensitive) { const unaligned_uint32_t* wordp = (const unaligned_uint32_t*)s; // ... } ==30742== Process terminating with default action of signal 11 (SIGSEGV) ==30742== General Protection Fault ==30742== at 0x400F38: TE_Field_id::Initialize_token_maps() (hashmap.hpp:672) ==30742== by 0x4017A8: main (te_main.cpp:5) It crashes on the same line (with or without valgrind): hash += *wordp++ & 0xdfdfdfdf; For grins, I initialized the hash variable and cast the 0xdfdfdfdf constant to unaligned_uint32_t as well, but the problem persists. I did verify that removing -ftree-vectorize alone does fix the eliminate the crash; "-O2 -finline-functions -funswitch-loops -fpredictive-commoning -fgcse-after-reload -fipa-cp-clone -ftree-loop-distribute-patterns -ftree-slp-vectorize" does not crash. So, is there a bug in the annotations you suggested? Is there any other "easy" workaround besides disabling the optimization on this function? Very sincerely, thanks for working with me on this issue! :)
These annotations do not work for me. You can calculate hash char-by-char or use Jakub's method #2.
I'd say that is a vectorization bug then. Either it shouldn't vectorize it at all, or needs to ensure unaligned vector loads (resp. stores) are used everywhere. Short testcase: typedef unsigned int U __attribute__((__aligned__ (1), __may_alias__)); __attribute__((noinline, noclone)) unsigned int foo (const char *s, int len) { const U *p = (const U *) s; unsigned int f = len / sizeof (unsigned int), hash = len, i; for (i = 0; i < f; ++i) hash += *p++; return hash; } char buf[64] __attribute__((aligned (32))); int main (void) { return foo (buf + 1, 26); }
The vectorizer uses loop peeling to align the loads from *p (by checking the address at the run time).
Sure, but for types with user alignment smaller than their size, loop peeling may often never be able to align it. This regressed with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161797 What I mean is that for say unsigned int type with 32-bit alignment loop peeling for valid code should always end up with correct alignment, but for unsigned int with 8-bit alignment it can't. And e.g. i?86 long long is of the same category, it has 64-bit size, but 32-bit alignment, so if you have a pointer to long long array, the whole array might be just 32-bit aligned, but not 64-bit, then any kind of peeling will still result in misalignment. __attribute__((noinline, noclone)) unsigned long long foo (const char *s, int len) { const unsigned long long *p = (const unsigned long long *) s; unsigned long long f = len / sizeof (unsigned long long), hash = len, i; for (i = 0; i < f; ++i) hash += *p++; return hash; } struct S { int i; long long buf[64]; } s; int main (void) { return foo ((const char *) s.buf, 60); } works fine though (with -m32 -O3 -mavx or -m32 -O3 -msse4).
(In reply to comment #14) > Sure, but for types with user alignment smaller than their size, loop peeling > may often never be able to align it. > > This regressed with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161797 This patch only allowed peeling for loads. So the problem existed for stores before this revision. > > What I mean is that for say unsigned int type with 32-bit alignment loop > peeling for valid code should always end up with correct alignment, but for > unsigned int with 8-bit alignment it can't. And e.g. i?86 long long is of the > same category, it has 64-bit size, but 32-bit alignment, so if you have a > pointer > to long long array, the whole array might be just 32-bit aligned, but not > 64-bit, > then any kind of peeling will still result in misalignment. The vectorizer calls builtin_vector_alignment_reachable in case of unknown alignment to verify that the access is naturally aligned. (i?86 doesn't implement this builtin and uses the default implementation). There is PR 42652 for a similar problem. Thanks, Ira
So perhaps: bool default_builtin_vector_alignment_reachable (const_tree type, bool is_packed) { - if (is_packed) + if (is_packed || (TYPE_USER_ALIGN (type) && compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) == 1) return false; and similarly in rs6000 and other hooks? BTW, tree_int_cst_compare (TYPE_SIZE (type), bitsize_int (POINTER_SIZE)) in the default hook probably should be compare_tree_int too.
Confirmed. Peeling for alignment is pointless if the access isn't at least aligned to the natural alignment of the vector element.
(In reply to comment #16) > So perhaps: > > bool > default_builtin_vector_alignment_reachable (const_tree type, bool is_packed) > { > - if (is_packed) > + if (is_packed || (TYPE_USER_ALIGN (type) && compare_tree_int (TYPE_SIZE > (type), TYPE_ALIGN (type)) == 1) > return false; > > and similarly in rs6000 and other hooks? > BTW, tree_int_cst_compare (TYPE_SIZE (type), bitsize_int (POINTER_SIZE)) > in the default hook probably should be compare_tree_int too. I think rather tree-vect-data-refs.c:vector_alignment_reachable_p should be adjusted. These "packed" checks in the target hooks don't make any sense to me either. In fact, I fail to see the point of a target hook completely (if it isn't maybe just for cost issues).
(In reply to comment #18) > I think rather tree-vect-data-refs.c:vector_alignment_reachable_p should > be adjusted. These "packed" checks in the target hooks don't make any > sense to me either. Yes, I think this can be moved to vector_alignment_reachable_p. > In fact, I fail to see the point of a target hook > completely (if it isn't maybe just for cost issues). There are some target specific checks in rs6000.c and arm.c.
On Thu, 7 Apr 2011, irar at il dot ibm.com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377 > > --- Comment #19 from Ira Rosen <irar at il dot ibm.com> 2011-04-07 09:55:44 UTC --- > (In reply to comment #18) > > I think rather tree-vect-data-refs.c:vector_alignment_reachable_p should > > be adjusted. These "packed" checks in the target hooks don't make any > > sense to me either. > > Yes, I think this can be moved to vector_alignment_reachable_p. > > > In fact, I fail to see the point of a target hook > > completely (if it isn't maybe just for cost issues). > > There are some target specific checks in rs6000.c and arm.c. I saw them, but I can't see what the difference is between "aligned" and "aligned" ;) Either the targets have aligned loads or they don't. We can target independently check whether we can for example reach 16-byte alignment - in which cases is it then we can't "reach" that alignment anyway due to target issues?
(In reply to comment #20) > I saw them, but I can't see what the difference is between > "aligned" and "aligned" ;) Either the targets have aligned loads > or they don't. We can target independently check whether we > can for example reach 16-byte alignment - in which cases is it then > we can't "reach" that alignment anyway due to target issues? What kind of target independent check do you have in mind? I guess this hook was added because such check is hard to implement. Here is the discussion about adding it http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00082.html.
When would compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) <= 0 give not what you are looking for?
Well, it's hard to accept such an easy solution when the original patch has hundreds of rows ;) So, you think that Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 172019) +++ tree-vect-data-refs.c (working copy) @@ -1110,12 +1110,9 @@ vector_alignment_reachable_p (struct dat if (ba) is_packed = contains_packed_reference (ba); - if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed); - if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) - return true; - else - return false; + if (is_packed + || compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0) + return false; } return true; is enough, and we can just get rid of vector_alignment_reachable? Thanks, Ira
(In reply to comment #23) > Well, it's hard to accept such an easy solution when the original patch has > hundreds of rows ;) > > So, you think that > > Index: tree-vect-data-refs.c > =================================================================== > --- tree-vect-data-refs.c (revision 172019) > +++ tree-vect-data-refs.c (working copy) > @@ -1110,12 +1110,9 @@ vector_alignment_reachable_p (struct dat > if (ba) > is_packed = contains_packed_reference (ba); > > - if (vect_print_dump_info (REPORT_DETAILS)) > - fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed); > - if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) > - return true; > - else > - return false; > + if (is_packed > + || compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0) > + return false; > } > > return true; > > is enough, and we can just get rid of vector_alignment_reachable? Yes, I think so. Richard. > Thanks, > Ira
(In reply to comment #24) > (In reply to comment #23) > > Well, it's hard to accept such an easy solution when the original patch has > > hundreds of rows ;) > > > > So, you think that > > > > Index: tree-vect-data-refs.c > > =================================================================== > > --- tree-vect-data-refs.c (revision 172019) > > +++ tree-vect-data-refs.c (working copy) > > @@ -1110,12 +1110,9 @@ vector_alignment_reachable_p (struct dat > > if (ba) > > is_packed = contains_packed_reference (ba); > > > > - if (vect_print_dump_info (REPORT_DETAILS)) > > - fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed); > > - if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) > > - return true; > > - else > > - return false; > > + if (is_packed > > + || compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0) > > + return false; > > } > > > > return true; > > > > is enough, and we can just get rid of vector_alignment_reachable? > > Yes, I think so. Even is_packed shouldn't be necessary, TYPE_ALIGN should better be correct (but IIRC it isn't for the FIELD_DECL types of packed struckts).
int a = __alignof__ (double); struct A { int b; double c; int d; } e; int f = __alignof__ (e.c); double *p; int g = __alignof__ (*p); int a2 = __alignof__ (long long); struct A2 { int b; long long c; int d; } e2; int f2 = __alignof__ (e2.c); long long *p2; int g2 = __alignof__ (*p2); on powerpc64-linux unfortunately shows that the target hook is needed, at least with -m64 -malign-power. Though, what the target hook does is definitely weird: if (TARGET_32BIT) { if (rs6000_alignment_flags == MASK_ALIGN_NATURAL) return true; if (rs6000_alignment_flags == MASK_ALIGN_POWER) return true; return false; } else { if (TARGET_MACHO) return false; /* Assuming that all other types are naturally aligned. CHECKME! */ return true; } I believe rs6000_alignment_flags can be either MASK_ALIGN_NATURAL, or MASK_ALIGN_POWER, nothing else, so wonder why it just doesn't return true for TARGET_32BIT. And, for TARGET_64BIT && !TARGET_MACHO, clearly DFmode types aren't naturally aligned with rs6000_alignment_flags == MASK_ALIGN_POWER. Therefore I'd say it should return false for TYPE_MODE (strip_array_types (type)) == DFmode && !TARGET_ALIGN_NATURAL, at least for Linux (rs6000 has unfortunately way too many ABIs and even more target switches). For TARGET_MACHO, ADJUST_FIELD_ALIGN is darwin.h:#define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \ darwin.h- (TARGET_ALIGN_NATURAL ? (COMPUTED) \ darwin.h- : (COMPUTED) == 128 ? 128 \ darwin.h- : MIN ((COMPUTED), 32)) thus I think it could use ADJUST_FIELD_ALIGN (NULL, TYPE_ALIGN (type)) == TYPE_ALIGN (type). And AIX is the same as Linux TARGET_64BIT, probably even for 32-bit AIX though. So, I think you want the compare_tree_int check in generic code (not sure if is_packed isn't redundant with that check), plus a hook which will sometimes tell you peeling won't necessarily align either.
(In reply to comment #26) > int a = __alignof__ (double); > struct A { int b; double c; int d; } e; > int f = __alignof__ (e.c); > double *p; > int g = __alignof__ (*p); > int a2 = __alignof__ (long long); > struct A2 { int b; long long c; int d; } e2; > int f2 = __alignof__ (e2.c); > long long *p2; > int g2 = __alignof__ (*p2); > > on powerpc64-linux unfortunately shows that the target hook is needed, at least > with -m64 -malign-power. You mean a != f in the above? That's the basic issue of stor-layout not using properly aligned types for FIELD_DECL types (same issue for packed structs). So you can't really use type alignment for memory references but you have to use get_object_alignment & friends. Same happens on any target for non-packed but just custom (mis-)aligned struct types.
I mean a and g are 8 with -m64 -malign-power, but f is 4. If you do: extern void abort (void); struct S { long long x; int a; double b[10000]; int c; } s; double *p = &s.b[0]; int main () { if (((long) p) & (__alignof__ (*p) - 1)) abort (); return 0; } I believe it will fail with -m64 -malign-power, and here it doesn't help at all that you check get_object_alignment, because __alignof__ (*p) is still 8, but .comm s,80016,8 ... p: .quad s+12 This isn't the default ABI for powerpc64-linux fortunately, but anyway. But, the above fails on i?86 too (without non-default -malign-double). So, we need some target hook that will tell us if ADJUST_FIELD_ALIGN could ever lower alignment of the type when present in structs...
On Thu, 7 Apr 2011, jakub at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377 > > --- Comment #28 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-04-07 13:16:13 UTC --- > I mean a and g are 8 with -m64 -malign-power, but f is 4. If you do: > extern void abort (void); > struct S { long long x; int a; double b[10000]; int c; } s; > double *p = &s.b[0]; Here's a known type-inconsistency as well. Or it's even an invalid testcase ... > int > main () > { > if (((long) p) & (__alignof__ (*p) - 1)) > abort (); > return 0; > } > > I believe it will fail with -m64 -malign-power, and here it doesn't help at all > that you check get_object_alignment, because __alignof__ (*p) is still 8, > but > .comm s,80016,8 > ... > p: > .quad s+12 > This isn't the default ABI for powerpc64-linux fortunately, but anyway. > > But, the above fails on i?86 too (without non-default -malign-double). > So, we need some target hook that will tell us if ADJUST_FIELD_ALIGN could ever > lower alignment of the type when present in structs... Well. It's also wrong with struct S { long long x; int a; double b[10000]; int c; } s __attribute__((aligned(4))); double *p = &s.b[0]; on x86_64. And honestly I'm not exactly sure if that situation is different than the i?86 or ppc one or if it is just a "bug" in the C standard that it allows this kind of alignment behavior (and thus, a bug in the GCC extension that allows adjusting it). Richard.
This is certainly a valid testcase: struct S { long long x; int a; double b[10000]; int c; } s; double *p = &s.b[0]; int foo (double *q) { int i, j = 0; for (i = 0; i < 10000; i++) j += *q++; return j; } int main (void) { return foo (p); } and without the target hook, vectorizer would think it can peel the loop and get it naturally aligned (assuming it isn't inlined at least etc. to see that q points to &s.b[0] initially). I think the bug is in the ABIs that do that kind of thing, but it is too late to fix the ABIs up.
(In reply to comment #30) > This is certainly a valid testcase: > > struct S { long long x; int a; double b[10000]; int c; } s; > double *p = &s.b[0]; > int > foo (double *q) > { > int i, j = 0; > for (i = 0; i < 10000; i++) > j += *q++; > return j; > } > int > main (void) > { > return foo (p); > } > > and without the target hook, vectorizer would think it can peel the loop and > get it naturally aligned (assuming it isn't inlined at least etc. to see that q > points to &s.b[0] initially). > > I think the bug is in the ABIs that do that kind of thing, but it is too late > to fix the ABIs up. The question is what we should do in the middle-end. If the above would happen for strict-alignment targets then *p would simply trap. The only consistent way for the middle-end would be to simply assume all pointer-to-doubles are only aligned to 4 bytes, thus use the minimal alignment that can happen (without using GCC extensions).
Given that without vectorization that code surely works well on all the weird ABIs, such ADJUST_FIELD_ALIGN is used only on targets that either aren't strict alignment targets, or aren't strict alignment targets for the particular types. You know, for most CPUs strict alignment isn't a binary thing, many targets including i?86/x86_64 are only partially strict alignment targets. Even for targets where say int needs to be accessed aligned often e.g. double can be just 4 byte aligned and insns that read/write doubles handle it. We really shouldn't change __alignof__ of types/INDIRECT_REFs/etc., those are all ABI changes. The problem with vectorization is that often the vectors have strict alignment, while non-vector ops are not.
On Thu, 7 Apr 2011, jakub at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377 > > --- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-04-07 13:43:47 UTC --- > Given that without vectorization that code surely works well on all the weird > ABIs, such ADJUST_FIELD_ALIGN is used only on targets that either aren't strict > alignment targets, or aren't strict alignment targets for the particular types. > You know, for most CPUs strict alignment isn't a binary thing, many targets > including i?86/x86_64 are only partially strict alignment targets. > Even for targets where say int needs to be accessed aligned often e.g. double > can be just 4 byte aligned and insns that read/write doubles handle it. > > We really shouldn't change __alignof__ of types/INDIRECT_REFs/etc., those are > all ABI changes. The problem with vectorization is that often the vectors have > strict alignment, while non-vector ops are not. Sure, I agree with all of the above. Still, if we need target hooks for this kind of stuff then something is really rotten in the middle-end. Richard.
While for 4.7 we can certainly do bigger changes, for 4.6 I think a safe fix would be: --- gcc/tree-vect-data-refs.c 2011-03-31 08:51:03.000000000 +0200 +++ gcc/tree-vect-data-refs.c 2011-04-07 18:32:37.379420938 +0200 @@ -1110,6 +1110,9 @@ vector_alignment_reachable_p (struct dat if (ba) is_packed = contains_packed_reference (ba); + if (compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0) + is_packed = true; + if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed); if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) For scalars GCC ignores packed attribute, so aligned(1) is the only way to express unaligned accesses.
On Thu, 7 Apr 2011, jakub at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377 > > --- Comment #34 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-04-07 16:36:04 UTC --- > While for 4.7 we can certainly do bigger changes, for 4.6 I think a safe fix > would be: > > --- gcc/tree-vect-data-refs.c 2011-03-31 08:51:03.000000000 +0200 > +++ gcc/tree-vect-data-refs.c 2011-04-07 18:32:37.379420938 +0200 > @@ -1110,6 +1110,9 @@ vector_alignment_reachable_p (struct dat > if (ba) > is_packed = contains_packed_reference (ba); > > + if (compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0) > + is_packed = true; > + > if (vect_print_dump_info (REPORT_DETAILS)) > fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed); > if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) > > For scalars GCC ignores packed attribute, so aligned(1) is the only way to > express unaligned accesses. Ok if it passes testing.
Author: jakub Date: Fri Apr 8 11:38:19 2011 New Revision: 172172 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172172 Log: PR tree-optimization/48377 * tree-vect-data-refs.c (vector_alignment_reachable_p): Set is_packed to true even for types with smaller TYPE_ALIGN than TYPE_SIZE. * gcc.dg/vect/pr48377.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr48377.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-data-refs.c
Author: jakub Date: Fri Apr 8 11:45:29 2011 New Revision: 172174 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172174 Log: PR tree-optimization/48377 * tree-vect-data-refs.c (vector_alignment_reachable_p): Set is_packed to true even for types with smaller TYPE_ALIGN than TYPE_SIZE. * gcc.dg/vect/pr48377.c: New test. Added: branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/pr48377.c Modified: branches/gcc-4_6-branch/gcc/ChangeLog branches/gcc-4_6-branch/gcc/testsuite/ChangeLog branches/gcc-4_6-branch/gcc/tree-vect-data-refs.c
Fixed (the testcase with aligned(1)). The original is INVALID.
> Fixed (the testcase with aligned(1)). The original is INVALID. The testcase with aligned(1) cannot pass on strict-alignment targets, either with or without vectorization. On the SPARC: (botcazou@nile) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.8 $ ./pr48377 Bus Error (core dumped) On IA-64: (botcazou@merced) /nile.build/botcazou/gcc-head/ia64-linux-gnu $ ./pr48377 pr48377(1543): unaligned access to 0x6000000000001041, ip=0x4000000000000650 pr48377(1543): unaligned access to 0x6000000000001045, ip=0x4000000000000650 pr48377(1543): unaligned access to 0x6000000000001049, ip=0x4000000000000650 pr48377(1543): unaligned access to 0x600000000000104d, ip=0x4000000000000650 Please move it to gcc.target/i386 or something equivalent.
(In reply to comment #39) > > Fixed (the testcase with aligned(1)). The original is INVALID. > > The testcase with aligned(1) cannot pass on strict-alignment targets, either > with or without vectorization. On the SPARC: > > (botcazou@nile) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.8 $ ./pr48377 > Bus Error (core dumped) > > On IA-64: > > (botcazou@merced) /nile.build/botcazou/gcc-head/ia64-linux-gnu $ ./pr48377 > pr48377(1543): unaligned access to 0x6000000000001041, ip=0x4000000000000650 > pr48377(1543): unaligned access to 0x6000000000001045, ip=0x4000000000000650 > pr48377(1543): unaligned access to 0x6000000000001049, ip=0x4000000000000650 > pr48377(1543): unaligned access to 0x600000000000104d, ip=0x4000000000000650 > > Please move it to gcc.target/i386 or something equivalent. Can you explain why it cannot pass on strict-alignment targets? The read is done through a type with explicit low alignment, so strict alignment targets just shouldn't use an aligned load but unaligned load in that case. If that doesn't work, it is a target bug.
> Can you explain why it cannot pass on strict-alignment targets? The read is > done through a type with explicit low alignment, so strict alignment targets > just shouldn't use an aligned load but unaligned load in that case. What do you mean by an unaligned load? Most targets don't have that. If you want to support typedef unsigned int U __attribute__((__aligned__ (1))); U id (U *u) { return *u; } on strict-alignment targets, you'll have to implement it in the middle-end. In the meantime (and for 4.6.x), the test cannot pass.
It surprises me it doesn't handle it, that is clearly a bug. Conceptually it is no different from struct A { char c; unsigned int d; } __attribute__((packed)); unsigned int id (struct A *p) { return p->d; } which is handled.
> Conceptually it is no different from > struct A { char c; unsigned int d; } __attribute__((packed)); > > unsigned int > id (struct A *p) > { > return p->d; > } > > which is handled. Guess what? We implement under-aligned scalar types in Ada, i.e. the equivalent of typedef unsigned int U __attribute__((__aligned__ (1))) by wrapping them up in a record type. ;-) As soon as you give SImode to the type, the game is over.
Created attachment 24423 [details] gcc46-pr48377.patch I don't see why mode should be relevant, the MEM_REF should have from the type clear indication that it is unaligned load (or store) and just should lead to the bit-fieldish expansion. Anyway, here is a testsuite patch for now, works on x86_64, not tested other targets.
(In reply to comment #44) > Created attachment 24423 [details] > gcc46-pr48377.patch > > I don't see why mode should be relevant, the MEM_REF should have from the type > clear indication that it is unaligned load (or store) and just should lead to > the bit-fieldish expansion. > > Anyway, here is a testsuite patch for now, works on x86_64, not tested other > targets. The issue is (or at least was) that the mem-attrs get the alignment from the alignment of the mode (for indirect-refs) unconditionally on strict-align targets until 2010-12-30 Ulrich Weigand <Ulrich.Weigand@de.ibm.com> * emit-rtl.c (set_mem_attributes_minus_bitpos): Explicitly derive default values from MEM mode if no memory attributes are present. Do not use mode alignment, even on STRICT_ALIGNMENT targets, when called with an expression (not a type). and the indirect-ref path never went down the misaligned path as the get_inner_reference path did. This is a very old issue. The only thing MEM_REF improves here is that if the target has a movmisalign_optab it is uses for such accesses, otherwise there will be simply a MEM RTX with alingment less than the modes alignment which has no effect. It is a genuine middle-end / target issue that we do not properly implement this GCC extension for strict-align targets.
Author: jakub Date: Sun Jun 26 07:57:30 2011 New Revision: 175408 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175408 Log: 2011-06-26 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/48377 * gcc.dg/vect/pr48377.c: Add dg-require-effective-target non_strict_align. 2011-06-26 Steve Ellcey <sje@cup.hp.com> PR middle-end/49191 * lib/target-supports.exp (check_effective_target_non_strict_align): New. * gcc.dg/memcpy-3.c: Add dg-require-effective-target non_strict_align. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/memcpy-3.c trunk/gcc/testsuite/gcc.dg/vect/pr48377.c trunk/gcc/testsuite/lib/target-supports.exp
Author: jakub Date: Mon Jun 27 17:55:35 2011 New Revision: 175544 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175544 Log: Backported from mainline 2011-06-26 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/48377 * gcc.dg/vect/pr48377.c: Add dg-require-effective-target non_strict_align. 2011-06-26 Steve Ellcey <sje@cup.hp.com> PR middle-end/49191 * lib/target-supports.exp (check_effective_target_non_strict_align): New. Modified: branches/gcc-4_6-branch/gcc/testsuite/ChangeLog branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/pr48377.c branches/gcc-4_6-branch/gcc/testsuite/lib/target-supports.exp