Bug 48377 - [4.6/4.7 regression] miscompilation at -O3
[4.6/4.7 regression] miscompilation at -O3
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: tree-optimization
4.6.0
: P3 normal
: 4.6.1
Assigned To: Not yet assigned to anyone
: wrong-code
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-30 22:41 UTC by Matt Hargett
Modified: 2011-06-27 17:55 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-03-31 09:52:51


Attachments
tarball containing preprocessed output, valgrind warnings, and object files (523.73 KB, application/x-bzip)
2011-03-30 22:41 UTC, Matt Hargett
Details
file with regression compiled with same flags, aside from -ftree-* (196.61 KB, application/x-bzip)
2011-04-01 01:40 UTC, Matt Hargett
Details
reduced test case, commandline to compile in bug comments (80.34 KB, application/x-bzip)
2011-04-05 20:39 UTC, Matt Hargett
Details
gcc46-pr48377.patch (470 bytes, patch)
2011-06-03 11:24 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Hargett 2011-03-30 22:41:50 UTC
Created attachment 23827 [details]
tarball containing preprocessed output, valgrind warnings, and object files

With GCC 4.6.0, I get a crash, early in the startup of our system, when compiling the system at –O3. This previously worked fine with GCC 4.5 and 4.4.5. Re-compiling only the top-most .cpp file in the crash's call stack with –O2, and re-linking, fixes the crash. 


In the attached archive:

-console_urls.i, in which
Console_url_table::Add_url_to_category() calls url_map->Put(), which ends up calling Hashcode() and crashes on a dereference of the 'wordp' variable (*wordp++ & 0xdfdfdfdf). gdb claims wordp has been optimized out when we debug the crash. 

-vg-*, which is the valgrind (3.6.1, using the ubuntu natty package) output at each optimization level. I have been told that some of the warnings are false positives, but the PPL and GMP ones I have never seen before. (For what it's worth, I ran valgrind on 4.5.3 compiling the same file, and it had nothing to report at all in GCC or related libraries.)

-build-cross-4.6.sh, which is the script I used to generate the toolchain and contains package versions, configure commandlines, and build parameters for GCC and all the supporting libraries. I tried downgrading a few of the libraries to versions supplied in the sourceryg++ tarball, but it didn’t appear to fix the issue. The 4.5.3 I tested (which did not have this bug) was compiled with a similar script, and with all the same library versions.

My commandline:
g++ -Wa,-adlh=build_products/shared/x86_64/SOGS_native/release/gcc_v4.6.0/console_urls.s  -pipe -m64 -minline-all-stringops -fvisibility-inlines-hidden -ggdb -g3 -fpic -nostdinc -ftree-loop-linear -funroll-loops -march=opteron -Wa,-mtune=opteron -Wchar-subscripts -Wreturn-type -Wunknown-pragmas -Wno-int-to-pointer-cast -Wformat -Wformat-nonliteral -Wformat-security -Wpointer-arith -Wcast-align -Wwrite-strings -Wsign-compare -Wunused-variable -Wunused-parameter -Wparentheses -fasynchronous-unwind-tables -fno-enforce-eh-specs -fcheck-new -O3 -fno-omit-frame-pointer -fno-strict-aliasing -fno-builtin-abort -fno-builtin-calloc -fno-builtin-exit -fno-builtin-free -fno-builtin-malloc -fno-builtin-realloc -fvisibility=default -fno-threadsafe-statics -Werror -I/home/matt/src/clueboat/porkius/main/src/include/libc -I/home/matt/src/clueboat/porkius/main/src/include/os -I/home/matt/src/clueboat/porkius/main/src/include/hal/x86_64 -I/home/matt/src/clueboat/porkius/main/src/include/hal -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/include/legacy -I/home/matt/src/clueboat/porkius/main/src/network/stack/sys -I/home/matt/src/clueboat/porkius/main/src/network/stack/local -I/home/matt/src/clueboat/porkius/main/src/network/stack/include -I../../monitoring -I/home/matt/src/clueboat/porkius/main/objects/SOGS_x86_64_release/autogen/management/ami/SOGS/dml/include -I/home/matt/src/clueboat/porkius/main/src/management/ami/SOGS/include -I/home/matt/src/clueboat/porkius/main/src/management/ami/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/ami/dml/include -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0 -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0/x86_64-iscb-SOGS -DOPENSSL_FIPS -DSOGS_ANY -DSOGS_NATIVE -DSOGS_porkius -DSOGS_TARGET_NAME=x86_64   -DSOGS_SRC_ROOT="\"/home/matt/src/clueboat/porkius/main/src/util/shared/\"" -c -O3 /home/matt/src/clueboat/porkius/main/src/util/shared/console_urls.cpp -o /tmp/console_urls-O3.o

If there’s anything else you need from me to help debug/resolve this, please let me know.
Comment 1 Richard Biener 2011-03-31 09:52:51 UTC
Does the issue reproduce without using -ftree-loop-linear?  In general try
to avoid loads of -f switches and stick to basic (and well tested) -Ox
options.  Does it fail with just -O3?
Comment 2 Matt Hargett 2011-04-01 01:38:20 UTC
(In reply to comment #1)
> Does the issue reproduce without using -ftree-loop-linear?  In general try
> to avoid loads of -f switches and stick to basic (and well tested) -Ox
> options.  Does it fail with just -O3?

Yes, it appears to fail in the same way with all of the -ftree-* flags removed:
SOGS-g++ -pipe -m64 -minline-all-stringops -fvisibility-inlines-hidden -ggdb -g3 -c -fpic -nostdinc  -march=amdfam10 -Wa,-mtune=amdfam10 -Wchar-subscripts -Wreturn-type -Wunknown-pragmas -Wno-int-to-pointer-cast -Wformat -Wformat-nonliteral -Wformat-security -Wpointer-arith -Wcast-align -Wwrite-strings -Wsign-compare -Wunused-variable -Wunused-parameter -Wparentheses -fasynchronous-unwind-tables -fno-enforce-eh-specs -fcheck-new -O3 -fno-omit-frame-pointer -fno-strict-aliasing -fno-builtin-abort -fno-builtin-calloc -fno-builtin-exit -fno-builtin-free -fno-builtin-malloc -fno-builtin-realloc -fvisibility=hidden -fno-threadsafe-statics -Werror -I/home/matt/src/clueboat/porkius/main/src/include/libc -I/home/matt/src/clueboat/porkius/main/src/include/os -I/home/matt/src/clueboat/porkius/main/src/include/hal/x86_64 -I/home/matt/src/clueboat/porkius/main/src/include/hal -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I. -I/home/matt/src/clueboat/porkius/main/src/common/portability/shared/include -I/home/matt/src/clueboat/porkius/main/src/common/include -I/home/matt/src/clueboat/porkius/main/src/include/legacy -I/home/matt/src/clueboat/porkius/main/src/network/stack/sys -I/home/matt/src/clueboat/porkius/main/src/network/stack/local -I/home/matt/src/clueboat/porkius/main/src/network/stack/include -I/home/matt/src/clueboat/porkius/main/src/standard/include -I/home/matt/src/clueboat/porkius/main/src/include/legacy/cfssl -I/home/matt/src/clueboat/porkius/main/src/include/legacy/openssl -I/home/matt/src/clueboat/porkius/main/src/include/os/core/platform -I/home/matt/src/clueboat/porkius/main/src/proxy/ssl/ssl_proxy -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0 -I/home/matt/src/clueboat/porkius/toolchain/linux/x86_64_host/gcc-cross/x86_64_target/v4.6.0/x86_64-iscb-SOGS/include/c++/4.6.0/x86_64-iscb-SOGS -DCFSSL_PROXY -DOPENSSL_FIPS -DSOGS_ANY -DSOGS_NATIVE -DSOGS_porkius -DSOGS_TARGET_NAME=x86_64 -D_CRTBLD   -DSOGS_SRC_ROOT="\"/home/matt/src/clueboat/porkius/main/src/proxy/ssl/\"" -o build_products/shared/x86_64/SOGS_native/release/gcc_v4.6.0/cli.o /home/matt/src/clueboat/porkius/main/src/proxy/ssl/cli.cpp

I do generally agree with you on not employing too many individual optimization flags. We did a lot of testing with GCC 4.4.x and found this set of flags produced optimal system performance for our codebase. If plain -O3 in 4.6.0 provides superior performance, then I have an argument for simplifying thing. If plain -O3 is worse than 4.4.x, the product performance would regress and I'll have no legitimate argument ;)

Either way, it used to work fine, so it's still a regression. I have attached the object file output compiled with the above commandline.
Comment 3 Matt Hargett 2011-04-01 01:40:34 UTC
Created attachment 23845 [details]
file with regression compiled with same flags, aside from -ftree-*
Comment 4 Jakub Jelinek 2011-04-01 09:51:06 UTC
Anyway, please read http://gcc.gnu.org/bugs.html, there is nothing we can do from the information you've provided.  Trying to find from a vague description of where you see the crash a difference in assembly is very hard, plus it may very well be just a bug in the sources you are compiling.  If -O2 works and -O3 doesn't, try to add __attribute__((noinline)) resp. __attribute__((optimize(2)))
resp. __attribute__((optimize(3))) to various suspect routines in the file to try to narrow it down to a single problematic routine (perhaps with some functions inlined into it), which will work fine if the whole file is compiled with -O3 just that routine is -O2 (and perhaps functions inlined into it) and will misbehave if everything is compled with -O2 just that function is compiled with -O3.  Then try to make a self-contained executable testcase out of it (just call that routine from some other CU's main, perhaps with the minimal necessary setup for it and stub everything it calls.  For (suspected) miscompilation we really need to be able to reproduce it ourselves, preferrably with a minimal testcase, otherwise we can't do anything on it unless you debug the problem yourself down to pointing a bug somewhere exact in the assembly.
Comment 5 Matt Hargett 2011-04-05 03:53:19 UTC
(In reply to comment #4)
> Anyway, please read http://gcc.gnu.org/bugs.html, there is nothing we can do
> from the information you've provided.  Trying to find from a vague description
> of where you see the crash a difference in assembly is very hard, plus it may
> very well be just a bug in the sources you are compiling.  If -O2 works and -O3
> doesn't, try to add __attribute__((noinline)) resp.
> __attribute__((optimize(2)))
> resp. __attribute__((optimize(3))) to various suspect routines in the file to
> try to narrow it down to a single problematic routine (perhaps with some
> functions inlined into it), which will work fine if the whole file is compiled
> with -O3 just that routine is -O2 (and perhaps functions inlined into it) and
> will misbehave if everything is compled with -O2 just that function is compiled
> with -O3.  Then try to make a self-contained executable testcase out of it
> (just call that routine from some other CU's main, perhaps with the minimal
> necessary setup for it and stub everything it calls.  For (suspected)
> miscompilation we really need to be able to reproduce it ourselves, preferrably
> with a minimal testcase, otherwise we can't do anything on it unless you debug
> the problem yourself down to pointing a bug somewhere exact in the assembly.

I've got a greatly reduced testcase that I'm continuing to pare down. I'll attach the self-contained example tomorrow.
Comment 6 Matt Hargett 2011-04-05 20:38:41 UTC
in the 46bug/ dir in the attached tarball:
g++ -O3 -g -I. -c te_field_id.cpp && g++ *.o -o test

induces the crash:

==31412==  General Protection Fault
==31412==    at 0x400F38: TE_Field_id::Initialize_token_maps() (hashmap.hpp:710)
==31412==    by 0x4017A8: main (te_main.cpp:5)


This is the same crash I described before. Compiling with -O2 eliminates the crash. In te_field_id.hpp, there is a commented out __attribute__((optimize(2))). In the non-reduced case, adding that annotation just moves the crash around. I may file a separate bug for that, depending on the outcome of this bug.
Comment 7 Matt Hargett 2011-04-05 20:39:33 UTC
Created attachment 23887 [details]
reduced test case, commandline to compile in bug comments
Comment 8 Dmitry Gorbachev 2011-04-06 05:25:21 UTC
In hashmap.hpp

static inline uint32_t
Hashcode( const char* s, uint32_t len, bool case_insensitive)
{
        // prepare to run through string in 4-byte chunks
        const uint32_t* wordp = (const uint32_t*)s;

bug: wordp can be incorrectly aligned.
Comment 9 Jakub Jelinek 2011-04-06 07:01:59 UTC
Yeah, and with -O3 you are requesting vectorization.  When that loop gets vectorized, gcc counts on that you don't lie about alignment.
You should be using either
typedef uint32_t unaligned_uint32 __attribute__((__aligned__ (1), __may_alias__)));
and use that type for wordp, or just use may_alias attribute and first do a few byte cycles to align the pointer, then do the main cycle, then do any remaining bytes.
Comment 10 Matt Hargett 2011-04-06 22:22:41 UTC
I do see the alignment problem you point out (though I'm disappointed that neither PC-Lint nor GCC's warnings alerted me). I made the changes you proposed, but still get the same crash:

typedef uint32_t unaligned_uint32_t __attribute__((__aligned__ (1),__may_alias__));

// ...

static inline uint32_t
Hashcode( const char* s, uint32_t len, bool case_insensitive)
{
        const unaligned_uint32_t* wordp = (const unaligned_uint32_t*)s;
// ...
}


==30742== Process terminating with default action of signal 11 (SIGSEGV)
==30742==  General Protection Fault
==30742==    at 0x400F38: TE_Field_id::Initialize_token_maps() (hashmap.hpp:672)
==30742==    by 0x4017A8: main (te_main.cpp:5)

It crashes on the same line (with or without valgrind):
                        hash += *wordp++ & 0xdfdfdfdf;

For grins, I initialized the hash variable and cast the 0xdfdfdfdf constant to unaligned_uint32_t as well, but the problem persists. I did verify that removing -ftree-vectorize alone does fix the eliminate the crash;
"-O2 -finline-functions -funswitch-loops -fpredictive-commoning -fgcse-after-reload -fipa-cp-clone -ftree-loop-distribute-patterns -ftree-slp-vectorize" does not crash.

So, is there a bug in the annotations you suggested? Is there any other "easy" workaround besides disabling the optimization on this function?

Very sincerely, thanks for working with me on this issue! :)
Comment 11 Dmitry Gorbachev 2011-04-06 23:30:34 UTC
These annotations do not work for me. You can calculate hash char-by-char or use Jakub's method #2.
Comment 12 Jakub Jelinek 2011-04-07 07:44:18 UTC
I'd say that is a vectorization bug then.  Either it shouldn't vectorize it at all, or needs to ensure unaligned vector loads (resp. stores) are used everywhere.

Short testcase:

typedef unsigned int U __attribute__((__aligned__ (1), __may_alias__));

__attribute__((noinline, noclone)) unsigned int
foo (const char *s, int len)
{
  const U *p = (const U *) s;
  unsigned int f = len / sizeof (unsigned int), hash = len, i;

  for (i = 0; i < f; ++i)
    hash += *p++;
  return hash;
}

char buf[64] __attribute__((aligned (32)));

int
main (void)
{
  return foo (buf + 1, 26);
}
Comment 13 Ira Rosen 2011-04-07 08:00:23 UTC
The vectorizer uses loop peeling to align the loads from *p (by checking the address at the run time).
Comment 14 Jakub Jelinek 2011-04-07 08:20:30 UTC
Sure, but for types with user alignment smaller than their size, loop peeling may often never be able to align it.

This regressed with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161797

What I mean is that for say unsigned int type with 32-bit alignment loop peeling for valid code should always end up with correct alignment, but for
unsigned int with 8-bit alignment it can't.  And e.g. i?86 long long is of the same category, it has 64-bit size, but 32-bit alignment, so if you have a pointer
to long long array, the whole array might be just 32-bit aligned, but not 64-bit,
then any kind of peeling will still result in misalignment.

__attribute__((noinline, noclone)) unsigned long long
foo (const char *s, int len)
{
  const unsigned long long *p = (const unsigned long long *) s;
  unsigned long long f = len / sizeof (unsigned long long), hash = len, i;

  for (i = 0; i < f; ++i)
    hash += *p++;
  return hash;
}

struct S
{
  int i;
  long long buf[64];
} s;

int
main (void)
{
  return foo ((const char *) s.buf, 60);
}

works fine though (with -m32 -O3 -mavx or -m32 -O3 -msse4).
Comment 15 Ira Rosen 2011-04-07 09:00:43 UTC
(In reply to comment #14)
> Sure, but for types with user alignment smaller than their size, loop peeling
> may often never be able to align it.
> 
> This regressed with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161797

This patch only allowed peeling for loads. So the problem existed for stores before this revision.

> 
> What I mean is that for say unsigned int type with 32-bit alignment loop
> peeling for valid code should always end up with correct alignment, but for
> unsigned int with 8-bit alignment it can't.  And e.g. i?86 long long is of the
> same category, it has 64-bit size, but 32-bit alignment, so if you have a
> pointer
> to long long array, the whole array might be just 32-bit aligned, but not
> 64-bit,
> then any kind of peeling will still result in misalignment.

The vectorizer calls builtin_vector_alignment_reachable in case of unknown alignment to verify that the access is naturally aligned. (i?86 doesn't implement this builtin and uses the default implementation). 
There is PR 42652 for a similar problem.

Thanks,
Ira
Comment 16 Jakub Jelinek 2011-04-07 09:34:36 UTC
So perhaps:

 bool
 default_builtin_vector_alignment_reachable (const_tree type, bool is_packed)
 {
-  if (is_packed)
+  if (is_packed || (TYPE_USER_ALIGN (type) && compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) == 1)
     return false;

and similarly in rs6000 and other hooks?
BTW, tree_int_cst_compare (TYPE_SIZE (type), bitsize_int (POINTER_SIZE))
in the default hook probably should be compare_tree_int too.
Comment 17 Richard Biener 2011-04-07 09:35:08 UTC
Confirmed.  Peeling for alignment is pointless if the access isn't at least
aligned to the natural alignment of the vector element.
Comment 18 Richard Biener 2011-04-07 09:39:13 UTC
(In reply to comment #16)
> So perhaps:
> 
>  bool
>  default_builtin_vector_alignment_reachable (const_tree type, bool is_packed)
>  {
> -  if (is_packed)
> +  if (is_packed || (TYPE_USER_ALIGN (type) && compare_tree_int (TYPE_SIZE
> (type), TYPE_ALIGN (type)) == 1)
>      return false;
> 
> and similarly in rs6000 and other hooks?
> BTW, tree_int_cst_compare (TYPE_SIZE (type), bitsize_int (POINTER_SIZE))
> in the default hook probably should be compare_tree_int too.

I think rather tree-vect-data-refs.c:vector_alignment_reachable_p should
be adjusted.  These "packed" checks in the target hooks don't make any
sense to me either.  In fact, I fail to see the point of a target hook
completely (if it isn't maybe just for cost issues).
Comment 19 Ira Rosen 2011-04-07 09:55:44 UTC
(In reply to comment #18)
> I think rather tree-vect-data-refs.c:vector_alignment_reachable_p should
> be adjusted.  These "packed" checks in the target hooks don't make any
> sense to me either.  

Yes, I think this can be moved to vector_alignment_reachable_p.

> In fact, I fail to see the point of a target hook
> completely (if it isn't maybe just for cost issues).

There are some target specific checks in rs6000.c and arm.c.
Comment 20 rguenther@suse.de 2011-04-07 10:24:45 UTC
On Thu, 7 Apr 2011, irar at il dot ibm.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377
> 
> --- Comment #19 from Ira Rosen <irar at il dot ibm.com> 2011-04-07 09:55:44 UTC ---
> (In reply to comment #18)
> > I think rather tree-vect-data-refs.c:vector_alignment_reachable_p should
> > be adjusted.  These "packed" checks in the target hooks don't make any
> > sense to me either.  
> 
> Yes, I think this can be moved to vector_alignment_reachable_p.
> 
> > In fact, I fail to see the point of a target hook
> > completely (if it isn't maybe just for cost issues).
> 
> There are some target specific checks in rs6000.c and arm.c.

I saw them, but I can't see what the difference is between
"aligned" and "aligned" ;)  Either the targets have aligned loads
or they don't.  We can target independently check whether we
can for example reach 16-byte alignment - in which cases is it then
we can't "reach" that alignment anyway due to target issues?
Comment 21 Ira Rosen 2011-04-07 11:37:29 UTC
(In reply to comment #20)

> I saw them, but I can't see what the difference is between
> "aligned" and "aligned" ;)  Either the targets have aligned loads
> or they don't.  We can target independently check whether we
> can for example reach 16-byte alignment - in which cases is it then
> we can't "reach" that alignment anyway due to target issues?

What kind of target independent check do you have in mind? I guess this hook was added because such check is hard to implement. Here is the discussion about adding it http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00082.html.
Comment 22 Jakub Jelinek 2011-04-07 11:45:57 UTC
When would
compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) <= 0
give not what you are looking for?
Comment 23 Ira Rosen 2011-04-07 12:02:46 UTC
Well, it's hard to accept such an easy solution when the original patch has hundreds of rows ;)

So, you think that 

Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c       (revision 172019)
+++ tree-vect-data-refs.c       (working copy)
@@ -1110,12 +1110,9 @@ vector_alignment_reachable_p (struct dat
       if (ba)
        is_packed = contains_packed_reference (ba);

-      if (vect_print_dump_info (REPORT_DETAILS))
-       fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed);
-      if (targetm.vectorize.vector_alignment_reachable (type, is_packed))
-       return true;
-      else
-       return false;
+      if (is_packed
+          || compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0)
+        return false;
     }

   return true;

is enough, and we can just get rid of vector_alignment_reachable?

Thanks,
Ira
Comment 24 Richard Biener 2011-04-07 12:42:48 UTC
(In reply to comment #23)
> Well, it's hard to accept such an easy solution when the original patch has
> hundreds of rows ;)
> 
> So, you think that 
> 
> Index: tree-vect-data-refs.c
> ===================================================================
> --- tree-vect-data-refs.c       (revision 172019)
> +++ tree-vect-data-refs.c       (working copy)
> @@ -1110,12 +1110,9 @@ vector_alignment_reachable_p (struct dat
>        if (ba)
>         is_packed = contains_packed_reference (ba);
> 
> -      if (vect_print_dump_info (REPORT_DETAILS))
> -       fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed);
> -      if (targetm.vectorize.vector_alignment_reachable (type, is_packed))
> -       return true;
> -      else
> -       return false;
> +      if (is_packed
> +          || compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0)
> +        return false;
>      }
> 
>    return true;
> 
> is enough, and we can just get rid of vector_alignment_reachable?

Yes, I think so.

Richard.

> Thanks,
> Ira
Comment 25 Richard Biener 2011-04-07 12:44:35 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > Well, it's hard to accept such an easy solution when the original patch has
> > hundreds of rows ;)
> > 
> > So, you think that 
> > 
> > Index: tree-vect-data-refs.c
> > ===================================================================
> > --- tree-vect-data-refs.c       (revision 172019)
> > +++ tree-vect-data-refs.c       (working copy)
> > @@ -1110,12 +1110,9 @@ vector_alignment_reachable_p (struct dat
> >        if (ba)
> >         is_packed = contains_packed_reference (ba);
> > 
> > -      if (vect_print_dump_info (REPORT_DETAILS))
> > -       fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed);
> > -      if (targetm.vectorize.vector_alignment_reachable (type, is_packed))
> > -       return true;
> > -      else
> > -       return false;
> > +      if (is_packed
> > +          || compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0)
> > +        return false;
> >      }
> > 
> >    return true;
> > 
> > is enough, and we can just get rid of vector_alignment_reachable?
> 
> Yes, I think so.

Even is_packed shouldn't be necessary, TYPE_ALIGN should better be correct
(but IIRC it isn't for the FIELD_DECL types of packed struckts).
Comment 26 Jakub Jelinek 2011-04-07 12:48:12 UTC
int a = __alignof__ (double);
struct A { int b; double c; int d; } e;
int f = __alignof__ (e.c);
double *p;
int g = __alignof__ (*p);
int a2 = __alignof__ (long long);
struct A2 { int b; long long c; int d; } e2;
int f2 = __alignof__ (e2.c);
long long *p2;
int g2 = __alignof__ (*p2);

on powerpc64-linux unfortunately shows that the target hook is needed, at least with -m64 -malign-power.  Though, what the target hook does is definitely weird:
  if (TARGET_32BIT)
    {
      if (rs6000_alignment_flags == MASK_ALIGN_NATURAL)
        return true;
      if (rs6000_alignment_flags ==  MASK_ALIGN_POWER)
        return true;
      return false;
    }
  else
    {
      if (TARGET_MACHO)
        return false;
      /* Assuming that all other types are naturally aligned. CHECKME!  */
      return true;
    }

I believe rs6000_alignment_flags can be either MASK_ALIGN_NATURAL, or MASK_ALIGN_POWER, nothing else, so wonder why it just doesn't return true for TARGET_32BIT.  And, for TARGET_64BIT && !TARGET_MACHO, clearly DFmode types aren't naturally aligned with rs6000_alignment_flags == MASK_ALIGN_POWER.
Therefore I'd say it should return false for
TYPE_MODE (strip_array_types (type)) == DFmode && !TARGET_ALIGN_NATURAL, at least for Linux (rs6000 has unfortunately way too many ABIs and even more target switches).  For TARGET_MACHO, ADJUST_FIELD_ALIGN is
darwin.h:#define ADJUST_FIELD_ALIGN(FIELD, COMPUTED)	\
darwin.h-  (TARGET_ALIGN_NATURAL ? (COMPUTED)		\
darwin.h-   : (COMPUTED) == 128 ? 128			\
darwin.h-   : MIN ((COMPUTED), 32))
thus I think it could use ADJUST_FIELD_ALIGN (NULL, TYPE_ALIGN (type)) == TYPE_ALIGN (type).  And AIX is the same as Linux TARGET_64BIT, probably even for 32-bit AIX though.

So, I think you want the compare_tree_int check in generic code (not sure if
is_packed isn't redundant with that check), plus a hook which will sometimes
tell you peeling won't necessarily align either.
Comment 27 Richard Biener 2011-04-07 12:56:34 UTC
(In reply to comment #26)
> int a = __alignof__ (double);
> struct A { int b; double c; int d; } e;
> int f = __alignof__ (e.c);
> double *p;
> int g = __alignof__ (*p);
> int a2 = __alignof__ (long long);
> struct A2 { int b; long long c; int d; } e2;
> int f2 = __alignof__ (e2.c);
> long long *p2;
> int g2 = __alignof__ (*p2);
> 
> on powerpc64-linux unfortunately shows that the target hook is needed, at least
> with -m64 -malign-power.

You mean a != f in the above?  That's the basic issue of stor-layout not
using properly aligned types for FIELD_DECL types (same issue for packed
structs).  So you can't really use type alignment for memory references
but you have to use get_object_alignment & friends.  Same happens on
any target for non-packed but just custom (mis-)aligned struct types.
Comment 28 Jakub Jelinek 2011-04-07 13:16:13 UTC
I mean a and g are 8 with -m64 -malign-power, but f is 4.  If you do:
extern void abort (void);
struct S { long long x; int a; double b[10000]; int c; } s;
double *p = &s.b[0];
int
main ()
{
  if (((long) p) & (__alignof__ (*p) - 1))
    abort ();
  return 0;
}

I believe it will fail with -m64 -malign-power, and here it doesn't help at all
that you check get_object_alignment, because __alignof__ (*p) is still 8,
but
	.comm	s,80016,8
...
p:
	.quad	s+12
This isn't the default ABI for powerpc64-linux fortunately, but anyway.

But, the above fails on i?86 too (without non-default -malign-double).
So, we need some target hook that will tell us if ADJUST_FIELD_ALIGN could ever
lower alignment of the type when present in structs...
Comment 29 rguenther@suse.de 2011-04-07 13:21:20 UTC
On Thu, 7 Apr 2011, jakub at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377
> 
> --- Comment #28 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-04-07 13:16:13 UTC ---
> I mean a and g are 8 with -m64 -malign-power, but f is 4.  If you do:
> extern void abort (void);
> struct S { long long x; int a; double b[10000]; int c; } s;
> double *p = &s.b[0];

Here's a known type-inconsistency as well.  Or it's even an
invalid testcase ...

> int
> main ()
> {
>   if (((long) p) & (__alignof__ (*p) - 1))
>     abort ();
>   return 0;
> }
> 
> I believe it will fail with -m64 -malign-power, and here it doesn't help at all
> that you check get_object_alignment, because __alignof__ (*p) is still 8,
> but
>     .comm    s,80016,8
> ...
> p:
>     .quad    s+12
> This isn't the default ABI for powerpc64-linux fortunately, but anyway.
> 
> But, the above fails on i?86 too (without non-default -malign-double).
> So, we need some target hook that will tell us if ADJUST_FIELD_ALIGN could ever
> lower alignment of the type when present in structs...

Well.  It's also wrong with

struct S { long long x; int a; double b[10000]; int c; } s 
__attribute__((aligned(4)));
double *p = &s.b[0];

on x86_64.  And honestly I'm not exactly sure if that situation is
different than the i?86 or ppc one or if it is just a "bug" in the
C standard that it allows this kind of alignment behavior (and
thus, a bug in the GCC extension that allows adjusting it).

Richard.
Comment 30 Jakub Jelinek 2011-04-07 13:29:15 UTC
This is certainly a valid testcase:

struct S { long long x; int a; double b[10000]; int c; } s;
double *p = &s.b[0];
int
foo (double *q)
{
  int i, j = 0;
  for (i = 0; i < 10000; i++)
    j += *q++;
  return j;
}
int
main (void)
{
  return foo (p);
}

and without the target hook, vectorizer would think it can peel the loop and get it naturally aligned (assuming it isn't inlined at least etc. to see that q points to &s.b[0] initially).

I think the bug is in the ABIs that do that kind of thing, but it is too late to fix the ABIs up.
Comment 31 Richard Biener 2011-04-07 13:35:04 UTC
(In reply to comment #30)
> This is certainly a valid testcase:
> 
> struct S { long long x; int a; double b[10000]; int c; } s;
> double *p = &s.b[0];
> int
> foo (double *q)
> {
>   int i, j = 0;
>   for (i = 0; i < 10000; i++)
>     j += *q++;
>   return j;
> }
> int
> main (void)
> {
>   return foo (p);
> }
> 
> and without the target hook, vectorizer would think it can peel the loop and
> get it naturally aligned (assuming it isn't inlined at least etc. to see that q
> points to &s.b[0] initially).
> 
> I think the bug is in the ABIs that do that kind of thing, but it is too late
> to fix the ABIs up.

The question is what we should do in the middle-end.  If the above would
happen for strict-alignment targets then *p would simply trap.  The
only consistent way for the middle-end would be to simply assume all
pointer-to-doubles are only aligned to 4 bytes, thus use the minimal
alignment that can happen (without using GCC extensions).
Comment 32 Jakub Jelinek 2011-04-07 13:43:47 UTC
Given that without vectorization that code surely works well on all the weird ABIs, such ADJUST_FIELD_ALIGN is used only on targets that either aren't strict alignment targets, or aren't strict alignment targets for the particular types.
You know, for most CPUs strict alignment isn't a binary thing, many targets including i?86/x86_64 are only partially strict alignment targets.
Even for targets where say int needs to be accessed aligned often e.g. double can be just 4 byte aligned and insns that read/write doubles handle it.

We really shouldn't change __alignof__ of types/INDIRECT_REFs/etc., those are all ABI changes.  The problem with vectorization is that often the vectors have strict alignment, while non-vector ops are not.
Comment 33 rguenther@suse.de 2011-04-07 13:57:09 UTC
On Thu, 7 Apr 2011, jakub at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377
> 
> --- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-04-07 13:43:47 UTC ---
> Given that without vectorization that code surely works well on all the weird
> ABIs, such ADJUST_FIELD_ALIGN is used only on targets that either aren't strict
> alignment targets, or aren't strict alignment targets for the particular types.
> You know, for most CPUs strict alignment isn't a binary thing, many targets
> including i?86/x86_64 are only partially strict alignment targets.
> Even for targets where say int needs to be accessed aligned often e.g. double
> can be just 4 byte aligned and insns that read/write doubles handle it.
> 
> We really shouldn't change __alignof__ of types/INDIRECT_REFs/etc., those are
> all ABI changes.  The problem with vectorization is that often the vectors have
> strict alignment, while non-vector ops are not.

Sure, I agree with all of the above.  Still, if we need target hooks for
this kind of stuff then something is really rotten in the middle-end.

Richard.
Comment 34 Jakub Jelinek 2011-04-07 16:36:04 UTC
While for 4.7 we can certainly do bigger changes, for 4.6 I think a safe fix would be:

--- gcc/tree-vect-data-refs.c	2011-03-31 08:51:03.000000000 +0200
+++ gcc/tree-vect-data-refs.c	2011-04-07 18:32:37.379420938 +0200
@@ -1110,6 +1110,9 @@ vector_alignment_reachable_p (struct dat
       if (ba)
 	is_packed = contains_packed_reference (ba);
 
+      if (compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0)
+	is_packed = true;
+
       if (vect_print_dump_info (REPORT_DETAILS))
 	fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed);
       if (targetm.vectorize.vector_alignment_reachable (type, is_packed))

For scalars GCC ignores packed attribute, so aligned(1) is the only way to express unaligned accesses.
Comment 35 rguenther@suse.de 2011-04-08 08:30:26 UTC
On Thu, 7 Apr 2011, jakub at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48377
> 
> --- Comment #34 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-04-07 16:36:04 UTC ---
> While for 4.7 we can certainly do bigger changes, for 4.6 I think a safe fix
> would be:
> 
> --- gcc/tree-vect-data-refs.c    2011-03-31 08:51:03.000000000 +0200
> +++ gcc/tree-vect-data-refs.c    2011-04-07 18:32:37.379420938 +0200
> @@ -1110,6 +1110,9 @@ vector_alignment_reachable_p (struct dat
>        if (ba)
>      is_packed = contains_packed_reference (ba);
> 
> +      if (compare_tree_int (TYPE_SIZE (type), TYPE_ALIGN (type)) > 0)
> +    is_packed = true;
> +
>        if (vect_print_dump_info (REPORT_DETAILS))
>      fprintf (vect_dump, "Unknown misalignment, is_packed = %d",is_packed);
>        if (targetm.vectorize.vector_alignment_reachable (type, is_packed))
> 
> For scalars GCC ignores packed attribute, so aligned(1) is the only way to
> express unaligned accesses.

Ok if it passes testing.
Comment 36 Jakub Jelinek 2011-04-08 11:38:23 UTC
Author: jakub
Date: Fri Apr  8 11:38:19 2011
New Revision: 172172

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172172
Log:
	PR tree-optimization/48377
	* tree-vect-data-refs.c (vector_alignment_reachable_p): Set
	is_packed to true even for types with smaller TYPE_ALIGN than
	TYPE_SIZE.

	* gcc.dg/vect/pr48377.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr48377.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
Comment 37 Jakub Jelinek 2011-04-08 11:45:31 UTC
Author: jakub
Date: Fri Apr  8 11:45:29 2011
New Revision: 172174

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172174
Log:
	PR tree-optimization/48377
	* tree-vect-data-refs.c (vector_alignment_reachable_p): Set
	is_packed to true even for types with smaller TYPE_ALIGN than
	TYPE_SIZE.

	* gcc.dg/vect/pr48377.c: New test.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/pr48377.c
Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/gcc/tree-vect-data-refs.c
Comment 38 Jakub Jelinek 2011-04-08 11:52:06 UTC
Fixed (the testcase with aligned(1)).  The original is INVALID.
Comment 39 Eric Botcazou 2011-06-03 10:01:01 UTC
> Fixed (the testcase with aligned(1)).  The original is INVALID.

The testcase with aligned(1) cannot pass on strict-alignment targets, either with or without vectorization.  On the SPARC:

(botcazou@nile) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.8 $ ./pr48377
Bus Error (core dumped)

On IA-64:

(botcazou@merced) /nile.build/botcazou/gcc-head/ia64-linux-gnu $ ./pr48377
pr48377(1543): unaligned access to 0x6000000000001041, ip=0x4000000000000650
pr48377(1543): unaligned access to 0x6000000000001045, ip=0x4000000000000650
pr48377(1543): unaligned access to 0x6000000000001049, ip=0x4000000000000650
pr48377(1543): unaligned access to 0x600000000000104d, ip=0x4000000000000650

Please move it to gcc.target/i386 or something equivalent.
Comment 40 Jakub Jelinek 2011-06-03 10:06:26 UTC
(In reply to comment #39)
> > Fixed (the testcase with aligned(1)).  The original is INVALID.
> 
> The testcase with aligned(1) cannot pass on strict-alignment targets, either
> with or without vectorization.  On the SPARC:
> 
> (botcazou@nile) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.8 $ ./pr48377
> Bus Error (core dumped)
> 
> On IA-64:
> 
> (botcazou@merced) /nile.build/botcazou/gcc-head/ia64-linux-gnu $ ./pr48377
> pr48377(1543): unaligned access to 0x6000000000001041, ip=0x4000000000000650
> pr48377(1543): unaligned access to 0x6000000000001045, ip=0x4000000000000650
> pr48377(1543): unaligned access to 0x6000000000001049, ip=0x4000000000000650
> pr48377(1543): unaligned access to 0x600000000000104d, ip=0x4000000000000650
> 
> Please move it to gcc.target/i386 or something equivalent.

Can you explain why it cannot pass on strict-alignment targets?  The read is done through a type with explicit low alignment, so strict alignment targets just shouldn't use an aligned load but unaligned load in that case.  If that doesn't work, it is a target bug.
Comment 41 Eric Botcazou 2011-06-03 10:41:55 UTC
> Can you explain why it cannot pass on strict-alignment targets?  The read is
> done through a type with explicit low alignment, so strict alignment targets
> just shouldn't use an aligned load but unaligned load in that case.

What do you mean by an unaligned load?  Most targets don't have that.  If you want to support

typedef unsigned int U __attribute__((__aligned__ (1)));

U id (U *u)
{
  return *u;
}

on strict-alignment targets, you'll have to implement it in the middle-end.  In the meantime (and for 4.6.x), the test cannot pass.
Comment 42 Jakub Jelinek 2011-06-03 10:54:54 UTC
It surprises me it doesn't handle it, that is clearly a bug.
Conceptually it is no different from
struct A { char c; unsigned int d; } __attribute__((packed));

unsigned int
id (struct A *p)
{
  return p->d;
}

which is handled.
Comment 43 Eric Botcazou 2011-06-03 11:10:35 UTC
> Conceptually it is no different from
> struct A { char c; unsigned int d; } __attribute__((packed));
> 
> unsigned int
> id (struct A *p)
> {
>   return p->d;
> }
> 
> which is handled.

Guess what?  We implement under-aligned scalar types in Ada, i.e. the equivalent of typedef unsigned int U __attribute__((__aligned__ (1))) by wrapping them up in a record type. ;-)  As soon as you give SImode to the type, the game is over.
Comment 44 Jakub Jelinek 2011-06-03 11:24:27 UTC
Created attachment 24423 [details]
gcc46-pr48377.patch

I don't see why mode should be relevant, the MEM_REF should have from the type clear indication that it is unaligned load (or store) and just should lead to the bit-fieldish expansion.

Anyway, here is a testsuite patch for now, works on x86_64, not tested other targets.
Comment 45 Richard Biener 2011-06-03 12:55:18 UTC
(In reply to comment #44)
> Created attachment 24423 [details]
> gcc46-pr48377.patch
> 
> I don't see why mode should be relevant, the MEM_REF should have from the type
> clear indication that it is unaligned load (or store) and just should lead to
> the bit-fieldish expansion.
> 
> Anyway, here is a testsuite patch for now, works on x86_64, not tested other
> targets.

The issue is (or at least was) that the mem-attrs get the alignment from
the alignment of the mode (for indirect-refs) unconditionally on strict-align
targets until

2010-12-30  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>

        * emit-rtl.c (set_mem_attributes_minus_bitpos): Explicitly derive
        default values from MEM mode if no memory attributes are present.
        Do not use mode alignment, even on STRICT_ALIGNMENT targets, when
        called with an expression (not a type).

and the indirect-ref path never went down the misaligned path as the
get_inner_reference path did.  This is a very old issue.  The only thing
MEM_REF improves here is that if the target has a movmisalign_optab it
is uses for such accesses, otherwise there will be simply a MEM RTX
with alingment less than the modes alignment which has no effect.

It is a genuine middle-end / target issue that we do not properly implement
this GCC extension for strict-align targets.
Comment 46 Jakub Jelinek 2011-06-26 07:58:18 UTC
Author: jakub
Date: Sun Jun 26 07:57:30 2011
New Revision: 175408

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175408
Log:
2011-06-26  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/48377
	* gcc.dg/vect/pr48377.c: Add dg-require-effective-target
	non_strict_align.

2011-06-26  Steve Ellcey  <sje@cup.hp.com>

	PR middle-end/49191
	* lib/target-supports.exp (check_effective_target_non_strict_align):
	New.
	* gcc.dg/memcpy-3.c: Add dg-require-effective-target non_strict_align.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/memcpy-3.c
    trunk/gcc/testsuite/gcc.dg/vect/pr48377.c
    trunk/gcc/testsuite/lib/target-supports.exp
Comment 47 Jakub Jelinek 2011-06-27 17:55:40 UTC
Author: jakub
Date: Mon Jun 27 17:55:35 2011
New Revision: 175544

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175544
Log:
	Backported from mainline
	2011-06-26  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/48377
	* gcc.dg/vect/pr48377.c: Add dg-require-effective-target
	non_strict_align.

	2011-06-26  Steve Ellcey  <sje@cup.hp.com>

	PR middle-end/49191
	* lib/target-supports.exp (check_effective_target_non_strict_align):
	New.

Modified:
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/gcc.dg/vect/pr48377.c
    branches/gcc-4_6-branch/gcc/testsuite/lib/target-supports.exp