Two tests from SPEC CPU2000, parser and gzip, fail at runtime when built on powerpc64-linux with "-O2 -ftree-vectorize -maltivec -mabi=altivec". parser fails with both -m32 and -m64, gzip with -m64. A regression hunt identified this patch as the start of the failures: http://gcc.gnu.org/viewcvs?view=rev&rev=113395 r113395 | dje | 2006-04-30 19:23:13 +0000 (Sun, 30 Apr 2006) I've been able to duplicate the failure with 32-bit parser by compiling only post-process.c with "-m32 -O2 -ftree-vectorize -maltivec -mabi=altivec" and everything else with "-m32 -O2". In addition, if all static variables in post-process.c are made global, the test passes, but with only word_links and visited left static it fails. I can continue to work on coming up with a self-contained test case, but perhaps this is enough information; let me know.
There's also a problem in the parser file analyze-linkage.c.
Interesting.
If the failures are associated with enabling section anchors by default, then this has uncovered another problem with the section anchor support. Do you have any more information about what is failing? Could this be an alignment issue of section anchors not placing a value in the new section with sufficient alignment?
Just like other bugs, this one will need a test case.
I haven't yet had time to continue trying to come up with a minimized testcase but hope to get to that soon. I had a vague hope that someone who understands the section anchor support and has access to SPEC CPU2000 could take a quick look at the generated code for post-process.c and notice something obvious.
I still think this looks like an alignment problem. Without section anchors GCC generates: .lcomm domain_array,13916,16 .type domain_array, @object .lcomm N_domains,4,4 .type N_domains, @object .lcomm N_domain_trees,4,4 .type N_domain_trees, @object .lcomm word_links,1000,16 .type word_links, @object .lcomm visited,1000,16 .type visited, @object with section-anchors GCC generates: .section ".bss" .align 2 .set .LANCHOR0,. + 0 .type N_domains, @object .size N_domains, 4 N_domains: .zero 4 .type domain_array, @object .size domain_array, 13916 domain_array: .zero 13916 .type visited, @object .size visited, 1000 visited: .zero 1000 .type word_links, @object .size word_links, 1000 word_links: .zero 1000 .type N_domain_trees, @object .size N_domain_trees, 4 N_domain_trees: .zero 4 with domain_array and word_links not guaranteed the alignment of 16, if I am reading the assembly correctly.
Based on David's descripion, a reduced testcase appears to be: static short f[100]; int bar (void) { return f[0]; } void foo (void) { int i; for (i = 0; i < 100; i++) f[i]++; } Looking at the assembly output of "-O2 -ftree-vectorize -maltivec -mabi=altivec", it seems that "f" will only be guaranteed 2-byte alignment with -fsection-anchors. Without -fno-section-anchors, "f" gets the expected 16-byte alignment. This is an ordering problem. gcc is compiling bar() first, and generating code on the assumption that "f" has natural alignment. The vectoriser then increases the alignment of "f", which throws off any layout based on the original natural alignment. If bar() is compiled first, then gcc really does need to be able to place "f" at a fixed offset in its section, so that it can use section anchors to access "f". So I think the possible fixes are: (1) Don't use section anchors for "f" in bar() (2) Don't increase the alignment of "f" in foo() (3) Increase the alignment of "f" before compiling either foo() or bar() (1) implies either (1a) not using section anchors for vectorisable variables or (1b) disabling -fsection-anchors when -ftree-vectorize is in effect. (2) implies either (2a) not increasing the alignment of variables that have already been assigned a block offset or (2b) preventing -ftree-vectorize from increasing alignment when -fsection-anchors is in effect. (3) implies increasing the alignment of all vectorisable variables if both -fsection-anchors and -ftree-vectorize are in effect. Neither (2a) nor (2b) is acceptable IMO. (I don't think (2a) is acceptable because the order of compilation is not guaranteed.) (1) is a worst-case fall-back position, with (1a) obviously being better than (1b). (3) seems more appealing, but only if we accept that -fsection-anchors -ftree-vectorize may increase the alignment of variables that do not in fact get vectorised. This is going to be a data size hit. (Hopefully it will only be a small hit, and I suppose -ftree-vectorize is already a "speed over size" optimisation.) If we choose (1) or (3), I suppose we should also add a gcc_assert() that the vectoriser is not increasing the alignment of a variable that has already been placed in a block (i.e. assert that (2a) would then be a no-op). Richard
What about instead of absolute numbers doing label subtraction for section anchors and then we can defer the decision for the layout of the section until after all functions are done compiling?
Subject: Re: [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec "pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: > What about instead of absolute numbers doing label subtraction for > section anchors and then we can defer the decision for the layout of > the section until after all functions are done compiling? I don't think symbolic offsets would work, if that's what you mean. We need to know the constant offset so that (a) we can enforce TARGET_{MIN,MAX}_ANCHOR_OFFSET (which is more important for other targets than it is for PowerPC) and (b) we know for PowerPC-like setups whether the offset needs an addis. Richard
The auto-vectorizer is a Tree-SSA pass. The section anchors are an RTL pass. I do not understand why the alignment of the vectorized variables is not known at section anchor creation time.
Subject: Re: [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec "dje at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: > The auto-vectorizer is a Tree-SSA pass. The section anchors are an > RTL pass. I do not understand why the alignment of the vectorized > variables is not known at section anchor creation time. The rtl optimisations for bar() precede the tree optimisations for foo(). Richard
(In reply to comment #10) > The auto-vectorizer is a Tree-SSA pass. The section anchors are an RTL pass. > I do not understand why the alignment of the vectorized variables is not known > at section anchor creation time. Because we decided while processing with the first function that the alignment for the variable is set. And we do tree and rtl intermixed when processing functions so we look at f's alignment during processing bar and then we change it during foo.
We're performing the auto-vectorization in unit-at-a-time-mode, so maybe we need to recompile the other functions. It seems that we're going to encounter more problems along these lines with LTO.
Subject: Re: [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec "dje at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: > We're performing the auto-vectorization in unit-at-a-time-mode, so > maybe we need to recompile the other functions. It seems that we're > going to encounter more problems along these lines with LTO. Well, I'm not convinced recompilation is the way to go. I can't imagine it scaling well in pathological cases. If we're talking about long-term fixes, I think we should just make sure that we vectorise all functions before applying rtl optimisations to any of them. But that's all moot anyway: either approach will take a long time to implement. (Note that, as things stand, we've already written out the asm code for bar() by the time we vectorise foo().) As far as 4.2 fixes go, does anyone disagree with my earlier comment? Richard
For other reasons it would be nice to be able to place "sync" points in the pass schedule where we re-start with going over all functions for the remaining passes. Per function SSA form is requires for this, though, and possibly more (like per function alias info).
I checked with the IBM XLC team and they speculatively increase the alignment of variables that could be auto-vectorized, so that gives another vote for that method. They did mention that whole-program IPA allows them to know when the increased alignment actually will be needed before emitting any functions so they can be more selective.
Increasing alignment of static variable during tree optimization should be as easy as increasing the DECL_ALIGN. All variables are output after all functions. Honza
Subject: Re: [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec "hubicka at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: > Increasing alignment of static variable during tree optimization should be as > easy as increasing the DECL_ALIGN. All variables are output after all > functions. This is what already happens. The problem is _when_ it happens. In the testcase, we first compile a simple function that has no vectorisable parts. The function accesses a variable and we decide to use section anchors to address it. We therefore place the variable in the section based on its current DECL_ALIGN. We then compile a different function that has vectorisable accesses to the same variable. We increase the variable's DECL_ALIGN accordingly. This new DECL_ALIGN throws off the assumptions made for the first function. The fix we've agreed is best in principle is to speculatively increase the DECL_ALIGN of vectorisable variables before compiling functions. Dorit says that there is a patch related to this on the autovect branch, which I'll look at when I get back from Ottawa. Richard
> The fix we've agreed is best in principle is to speculatively increase > the DECL_ALIGN of vectorisable variables before compiling functions. > Dorit says that there is a patch related to this on the autovect branch, > which I'll look at when I get back from Ottawa. > Richard Turns out the patch I was thinking about is only for the rs6000 port: http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00266.html so that's not much help. Do we want to implement this as a separate pass? at which point of the compilation? (doing it during ipa might be a problem if ipa is not enabled?)
Subject: Bug number PR27770 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01215.html
Subject: Bug 27770 Author: dorit Date: Thu Aug 3 20:35:05 2006 New Revision: 115910 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115910 Log: PR tree-optimization/27770 * tree-vectorizer.h (get_vectype_for_scalar_type): Function declaration removed (moved to tree-flow.h). (vect_can_force_dr_alignment_p): Likewise. * tree-flow.h (get_vectype_for_scalar_type): New function declaration (moved from tree-vectorizer.h). (vect_can_force_dr_alignment_p): Likewise. * tree-vectorizer.c (vect_print_dump_info): Allow calling this function from outside the vectorizer - in particular from cgraph stage. * tree-vect-analyze.c (vect_compute_data_ref_alignment): Don't increase the alignment of global arrays when -fsection-anchors is enabled. * cgraphunit.c (cgraph_increase_alignment): New function. (cgraph_optimize): Call cgraph_increase_alignment. Added: trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c trunk/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c trunk/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c Removed: trunk/gcc/testsuite/gcc.dg/vect/vect-69.c Modified: trunk/gcc/ChangeLog trunk/gcc/cgraphunit.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect.exp trunk/gcc/testsuite/lib/target-supports.exp trunk/gcc/tree-flow.h trunk/gcc/tree-vect-analyze.c trunk/gcc/tree-vectorizer.c
fixed?
The CPU2000 tests now pass, so as far as I'm concerned this is fixed.
Fixed, a new different bug for the missed optimization should be opened.
(In reply to comment #24) > Fixed, a new different bug for the missed optimization should be opened. It's PR28628.
Should the testcase for this bug be marked as expected to fail on darwin since that platform has section-anchors disabled? I see... FAIL: gcc.dg/vect/section-anchors-pr27770.c (test for excess errors) both on my machine and the logs from regress.
(In reply to comment #26) > FAIL: gcc.dg/vect/section-anchors-pr27770.c (test for excess errors) Darwin support for section anchors is messed up and really I had wished Apple would have fixed the support for it but oh well. I gave them notice like 3-5 months from when it was broken but nothing happened so I don't care about the testresults on powerpc-darwin as Apple does not care either.