Bug 27770 - [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec
Summary: [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.2.0
: P1 normal
Target Milestone: 4.2.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2006-05-25 16:23 UTC by Janis Johnson
Modified: 2006-08-22 16:08 UTC (History)
5 users (show)

See Also:
Host:
Target: powerpc-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-06-06 14:26:11


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Janis Johnson 2006-05-25 16:23:02 UTC
Two tests from SPEC CPU2000, parser and gzip, fail at runtime when built on powerpc64-linux with "-O2 -ftree-vectorize -maltivec -mabi=altivec".  parser fails with both -m32 and -m64, gzip with -m64.  A regression hunt identified this patch as the start of the failures:

    http://gcc.gnu.org/viewcvs?view=rev&rev=113395

    r113395 | dje | 2006-04-30 19:23:13 +0000 (Sun, 30 Apr 2006)

I've been able to duplicate the failure with 32-bit parser by compiling only post-process.c with "-m32 -O2 -ftree-vectorize -maltivec -mabi=altivec" and everything else with "-m32 -O2".  In addition, if all static variables in post-process.c are made global, the test passes, but with only word_links and visited left static it fails. I can continue to work on coming up with a self-contained test case, but perhaps this is enough information; let me know.
Comment 1 Janis Johnson 2006-05-25 16:30:55 UTC
There's also a problem in the parser file analyze-linkage.c.
Comment 2 Andrew Pinski 2006-05-25 16:45:27 UTC
Interesting.
Comment 3 David Edelsohn 2006-05-25 17:27:54 UTC
If the failures are associated with enabling section anchors by default, then this has uncovered another problem with the section anchor support.  Do you have any more information about what is failing?  Could this be an alignment issue of section anchors not placing a value in the new section with sufficient alignment?
Comment 4 Steven Bosscher 2006-06-05 10:29:20 UTC
Just like other bugs, this one will need a test case.
Comment 5 Janis Johnson 2006-06-05 20:58:11 UTC
I haven't yet had time to continue trying to come up with a minimized testcase but  hope to get to that soon.  I had a vague hope that someone who understands the section anchor support and has access to SPEC CPU2000 could take a quick look at the generated code for post-process.c and notice something obvious.
Comment 6 David Edelsohn 2006-06-05 21:25:31 UTC
I still think this looks like an alignment problem.  Without section anchors GCC generates:

        .lcomm  domain_array,13916,16
        .type   domain_array, @object
        .lcomm  N_domains,4,4
        .type   N_domains, @object
        .lcomm  N_domain_trees,4,4
        .type   N_domain_trees, @object
        .lcomm  word_links,1000,16
        .type   word_links, @object
        .lcomm  visited,1000,16
        .type   visited, @object

with section-anchors GCC generates:

        .section        ".bss"
        .align 2
        .set    .LANCHOR0,. + 0
        .type   N_domains, @object
        .size   N_domains, 4
N_domains:
        .zero   4
        .type   domain_array, @object
        .size   domain_array, 13916
domain_array:
        .zero   13916
        .type   visited, @object
        .size   visited, 1000
visited:
        .zero   1000
        .type   word_links, @object
        .size   word_links, 1000
word_links:
        .zero   1000
        .type   N_domain_trees, @object
        .size   N_domain_trees, 4
N_domain_trees:
        .zero   4

with domain_array and word_links not guaranteed the alignment of 16, if I am reading the assembly correctly.
Comment 7 Richard Sandiford 2006-06-06 08:54:22 UTC
Based on David's descripion, a reduced testcase appears to be:

static short f[100];
int
bar (void)
{
  return f[0];
}
void
foo (void)
{
  int i;
  for (i = 0; i < 100; i++)
    f[i]++;
}

Looking at the assembly output of "-O2 -ftree-vectorize -maltivec
-mabi=altivec", it seems that "f" will only be guaranteed 2-byte
alignment with -fsection-anchors.  Without -fno-section-anchors,
"f" gets the expected 16-byte alignment.

This is an ordering problem.  gcc is compiling bar() first, and
generating code on the assumption that "f" has natural alignment.
The vectoriser then increases the alignment of "f", which throws
off any layout based on the original natural alignment.

If bar() is compiled first, then gcc really does need to be able to
place "f" at a fixed offset in its section, so that it can use section
anchors to access "f".  So I think the possible fixes are:

  (1) Don't use section anchors for "f" in bar()
  (2) Don't increase the alignment of "f" in foo()
  (3) Increase the alignment of "f" before compiling either foo() or bar()

(1) implies either (1a) not using section anchors for vectorisable variables
or (1b) disabling -fsection-anchors when -ftree-vectorize is in effect.

(2) implies either (2a) not increasing the alignment of variables that have
already been assigned a block offset or (2b) preventing -ftree-vectorize
from increasing alignment when -fsection-anchors is in effect.

(3) implies increasing the alignment of all vectorisable variables if
both -fsection-anchors and -ftree-vectorize are in effect.

Neither (2a) nor (2b) is acceptable IMO.  (I don't think (2a) is
acceptable because the order of compilation is not guaranteed.)
(1) is a worst-case fall-back position, with (1a) obviously being
better than (1b).  (3) seems more appealing, but only if we accept
that -fsection-anchors -ftree-vectorize may increase the alignment
of variables that do not in fact get vectorised.  This is going to
be a data size hit.  (Hopefully it will only be a small hit, and
I suppose -ftree-vectorize is already a "speed over size"
optimisation.)

If we choose (1) or (3), I suppose we should also add a gcc_assert()
that the vectoriser is not increasing the alignment of a variable
that has already been placed in a block (i.e. assert that (2a) would
then be a no-op).

Richard
Comment 8 Andrew Pinski 2006-06-06 14:26:11 UTC
What about instead of absolute numbers doing label subtraction for section anchors and then we can defer the decision for the layout of the section until after all functions are done compiling?
Comment 9 richard@codesourcery.com 2006-06-06 15:02:14 UTC
Subject: Re:  [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec

"pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:
> What about instead of absolute numbers doing label subtraction for
> section anchors and then we can defer the decision for the layout of
> the section until after all functions are done compiling?

I don't think symbolic offsets would work, if that's what you mean.
We need to know the constant offset so that (a) we can enforce
TARGET_{MIN,MAX}_ANCHOR_OFFSET (which is more important for other
targets than it is for PowerPC) and (b) we know for PowerPC-like
setups whether the offset needs an addis.

Richard
Comment 10 David Edelsohn 2006-06-06 15:10:31 UTC
The auto-vectorizer is a Tree-SSA pass.  The section anchors are an RTL pass.  I do not understand why the alignment of the vectorized variables is not known at section anchor creation time.
Comment 11 richard@codesourcery.com 2006-06-06 15:16:33 UTC
Subject: Re:  [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec

"dje at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:
> The auto-vectorizer is a Tree-SSA pass.  The section anchors are an
> RTL pass.  I do not understand why the alignment of the vectorized
> variables is not known at section anchor creation time.

The rtl optimisations for bar() precede the tree optimisations
for foo().

Richard
Comment 12 Andrew Pinski 2006-06-06 15:18:07 UTC
(In reply to comment #10)
> The auto-vectorizer is a Tree-SSA pass.  The section anchors are an RTL pass. 
> I do not understand why the alignment of the vectorized variables is not known
> at section anchor creation time.

Because we decided while processing with the first function that the alignment for the variable is set.
And we do tree and rtl intermixed when processing functions so we look at f's alignment during processing bar and then we change it during foo.
Comment 13 David Edelsohn 2006-06-06 15:22:49 UTC
We're performing the auto-vectorization in unit-at-a-time-mode, so maybe we need to recompile the other functions.  It seems that we're going to encounter more problems along these lines with LTO.
Comment 14 richard@codesourcery.com 2006-06-06 15:53:44 UTC
Subject: Re:  [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec

"dje at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:
> We're performing the auto-vectorization in unit-at-a-time-mode, so
> maybe we need to recompile the other functions.  It seems that we're
> going to encounter more problems along these lines with LTO.

Well, I'm not convinced recompilation is the way to go.  I can't imagine
it scaling well in pathological cases.  If we're talking about long-term
fixes, I think we should just make sure that we vectorise all functions
before applying rtl optimisations to any of them.  But that's all moot
anyway: either approach will take a long time to implement.  (Note that,
as things stand, we've already written out the asm code for bar() by the
time we vectorise foo().)

As far as 4.2 fixes go, does anyone disagree with my earlier comment?

Richard
Comment 15 Richard Biener 2006-06-06 16:35:25 UTC
For other reasons it would be nice to be able to place "sync" points in the pass schedule where we re-start with going over all functions for the remaining passes. Per function SSA form is requires for this, though, and possibly more (like per function alias info).
Comment 16 David Edelsohn 2006-06-07 21:39:52 UTC
I checked with the IBM XLC team and they speculatively increase the alignment of variables that could be auto-vectorized, so that gives another vote for that method.  They did mention that whole-program IPA allows them to know when the increased alignment actually will be needed before emitting any functions so they can be more selective.
Comment 17 Jan Hubicka 2006-07-02 13:13:55 UTC
Increasing alignment of static variable during tree optimization should be as easy as increasing the DECL_ALIGN. All variables are output after all functions.

Honza
Comment 18 richard@codesourcery.com 2006-07-02 13:39:06 UTC
Subject: Re:  [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec

"hubicka at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:
> Increasing alignment of static variable during tree optimization should be as
> easy as increasing the DECL_ALIGN. All variables are output after all
> functions.

This is what already happens.  The problem is _when_ it happens.

In the testcase, we first compile a simple function that has no
vectorisable parts.  The function accesses a variable and we decide to
use section anchors to address it.  We therefore place the variable in
the section based on its current DECL_ALIGN.

We then compile a different function that has vectorisable accesses
to the same variable.  We increase the variable's DECL_ALIGN accordingly.
This new DECL_ALIGN throws off the assumptions made for the first function.

The fix we've agreed is best in principle is to speculatively increase
the DECL_ALIGN of vectorisable variables before compiling functions.
Dorit says that there is a patch related to this on the autovect branch,
which I'll look at when I get back from Ottawa.

Richard
Comment 19 Dorit Naishlos 2006-07-23 19:03:59 UTC
> The fix we've agreed is best in principle is to speculatively increase
> the DECL_ALIGN of vectorisable variables before compiling functions.
> Dorit says that there is a patch related to this on the autovect branch,
> which I'll look at when I get back from Ottawa.
> Richard

Turns out the patch I was thinking about is only for the rs6000 port:
http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00266.html
so that's not much help.

Do we want to implement this as a separate pass? at which point of the compilation? (doing it during ipa might be a problem if ipa is not enabled?) 
Comment 20 patchapp@dberlin.org 2006-07-29 20:30:17 UTC
Subject: Bug number PR27770

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-07/msg01215.html
Comment 21 dorit 2006-08-03 20:35:13 UTC
Subject: Bug 27770

Author: dorit
Date: Thu Aug  3 20:35:05 2006
New Revision: 115910

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=115910
Log:
        PR tree-optimization/27770
        * tree-vectorizer.h (get_vectype_for_scalar_type): Function
        declaration removed (moved to tree-flow.h).
        (vect_can_force_dr_alignment_p): Likewise.
        * tree-flow.h (get_vectype_for_scalar_type): New function declaration
        (moved from tree-vectorizer.h).
        (vect_can_force_dr_alignment_p): Likewise.
        * tree-vectorizer.c (vect_print_dump_info): Allow calling this function
        from outside the vectorizer - in particular from cgraph stage.
        * tree-vect-analyze.c (vect_compute_data_ref_alignment): Don't increase
        the alignment of global arrays when -fsection-anchors is enabled.
        * cgraphunit.c (cgraph_increase_alignment): New function.
        (cgraph_optimize): Call cgraph_increase_alignment.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
    trunk/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
    trunk/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
Removed:
    trunk/gcc/testsuite/gcc.dg/vect/vect-69.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cgraphunit.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/vect.exp
    trunk/gcc/testsuite/lib/target-supports.exp
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-vect-analyze.c
    trunk/gcc/tree-vectorizer.c

Comment 22 Paolo Bonzini 2006-08-04 06:04:38 UTC
fixed?
Comment 23 Janis Johnson 2006-08-04 16:48:02 UTC
The CPU2000 tests now pass, so as far as I'm concerned this is fixed.
Comment 24 Andrew Pinski 2006-08-04 17:03:01 UTC
Fixed, a new different bug for the missed optimization should be opened.
Comment 25 Dorit Naishlos 2006-08-07 07:09:18 UTC
(In reply to comment #24)
> Fixed, a new different bug for the missed optimization should be opened.

It's PR28628.

Comment 26 Jack Howarth 2006-08-22 16:06:14 UTC
Should the testcase for this bug be marked as expected to fail on darwin
since that platform has section-anchors disabled? I see...

FAIL: gcc.dg/vect/section-anchors-pr27770.c (test for excess errors)

both on my machine and the logs from regress.
Comment 27 Andrew Pinski 2006-08-22 16:08:18 UTC
(In reply to comment #26)
> FAIL: gcc.dg/vect/section-anchors-pr27770.c (test for excess errors)

Darwin support for section anchors is messed up and really I had wished Apple would have fixed the support for it but oh well.  I gave them notice like 3-5 months from when it was broken but nothing happened so I don't care about the testresults on powerpc-darwin as Apple does not care either.