Summary: | gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto | ||
---|---|---|---|
Product: | gcc | Reporter: | Tom de Vries <vries> |
Component: | lto | Assignee: | Richard Biener <rguenth> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | bergner, dje, ebotcazou, hubicka, irar, meissner, pthaugen |
Priority: | P3 | Keywords: | lto |
Version: | 4.7.0 | ||
Target Milestone: | 4.8.0 | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2012-01-02 00:00:00 | |
Attachments: |
test.c minimized from gcc.dg/vect/vect-reduc-2char.c
Assembly file of slp-perm-1.c after lto with -mcpu=power6 -O3 -maltivec slp-perm-1.c assembly file before LTO is run with -mcpu=power6 -O3 -maltivec patch Patch to restore LTO bootstrap with Ada + comment tweaks |
Description
Tom de Vries
2011-09-23 14:32:16 UTC
Created attachment 25348 [details]
test.c minimized from gcc.dg/vect/vect-reduc-2char.c
To reproduce:
$ powerpc-linux-gnu-gcc -O2 -ftree-vectorize test.c -flto -maltivec -static
$ ./a.out ; echo $?
218
The problematic code is still generated without -static, but for me this minimal testcase does not fail runtime without the -static.
Same issue occurs for gcc.dg/vect/pr44507.c with -m64. Configure line for compiler same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50499#c2 I see on powerpc64-suse-linux: FAIL: gcc.dg/vect/pr44507.c -flto execution test FAIL: gcc.dg/vect/pr45752.c -flto execution test FAIL: gcc.dg/vect/vect-reduc-2short.c -flto execution test FAIL: gcc.dg/vect/vect-reduc-6.c -flto execution test FAIL: gcc.dg/vect/slp-perm-1.c -flto execution test FAIL: gcc.dg/vect/slp-perm-2.c -flto execution test FAIL: gcc.dg/vect/slp-perm-3.c -flto execution test FAIL: gcc.dg/vect/slp-perm-4.c -flto execution test FAIL: gcc.dg/vect/slp-perm-5.c -flto execution test FAIL: gcc.dg/vect/slp-perm-6.c -flto execution test FAIL: gcc.dg/vect/slp-perm-7.c -flto execution test Removing static array initialization fixes the failures. Created attachment 29426 [details]
Assembly file of slp-perm-1.c after lto with -mcpu=power6 -O3 -maltivec
Created attachment 29427 [details]
slp-perm-1.c assembly file before LTO is run with -mcpu=power6 -O3 -maltivec
I am switching this to LTO instead of target, as it appears to be an LTO bug. Before LTO is run, the alignment of the .rodata section is 16 byte alignment since the array used to initialize the auto array is copied with altivec instructions. After LTO, the alignment of the .rodata section is 4 bytes. The powerpc Altivec instructions ignore the bottom 4 bits of the address, and so depending on what else is linked, the test will randomly fail or succeed. I added attachments from compiling slp-perm-1.c with -O3 -mcpu=power6 -maltivec -save-temps to give the asm files. I will have a looksee tomorrow. The -fsection-anchors option appears to be important. If I use -fsection-anchors (which is default for powerpc64-linux), LTO does not align the .rodata section, but uses Altivec memory instructions. If I use -fno-section-anchors, the .rodata section is not aligned, but it doesn't use Altivec memory instructions, so the test passes. If -fno-merge-constants (and the default -fsection-anchors) is used, then the correct alignment for the table is set (and Altivec memory instructions are used). At a guess, it is likely be in the gimplify_init_constructor function in gimplify.c. On Tue, 12 Feb 2013, meissner at gcc dot gnu.org wrote:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494
>
> --- Comment #10 from Michael Meissner <meissner at gcc dot gnu.org> 2013-02-12 19:16:56 UTC ---
> If -fno-merge-constants (and the default -fsection-anchors) is used, then the
> correct alignment for the table is set (and Altivec memory instructions are
> used).
>
> At a guess, it is likely be in the gimplify_init_constructor function in
> gimplify.c.
There is special code to handle initializers, it may be well that with
LTO we fail to do the correct thing here.
Created attachment 29436 [details]
patch
-fsection-anchors enables pass_ipa_increase_alignment (the vectorizer can
not align globals after any function was assembled, so we do that upfront).
In LTO:
/* If this variable belongs to the global constant pool, retrieve the
pre-computed RTL or recompute it in LTO mode. */
if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
{
SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));
return;
but obviously output_constant_def when just getting DECL_INITIAL cannot
honor any special alignment requirements of decl. It will simply get
a new decl with standard alignment.
We know decl is already the decl associated with the constant, so we
should just re-use it.
So I believe the attached should fix this. Can PPC people plaease test?
Eric added the original feature. > In LTO:
>
> /* If this variable belongs to the global constant pool, retrieve the
> pre-computed RTL or recompute it in LTO mode. */
> if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> {
> SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));
> return;
>
> but obviously output_constant_def when just getting DECL_INITIAL cannot
> honor any special alignment requirements of decl. It will simply get
> a new decl with standard alignment.
>
> We know decl is already the decl associated with the constant, so we
> should just re-use it.
Why does the alignment of a DECL_IN_CONSTANT_POOL matter here exactly?
The patch does align the .rodata section to 16 byte alignment, but the code to load up the auto vector from constant memory does not do vectorization. If I use -fno-section-anchors, it aligns .rodata to 4 byte alignment, and does not vectorize the code. If I use -fno-merge-constants, it aligns .rodata to 16 byte alignment, and does vectorize the code. If I use -fno-merge-constants without -flto, it aligns .rodata to 16 byte alignment, but it uses unaligned vector loads/stores. So the patch does help in that the tests now pass that were randomly failing. While it would be nice if we could get the initialization to be vectorized, I'm not how performance critical this is. Eric: if the alignment of the constant data that is used to initialize the auto array is a mismatch, and you use Altivec instructions, when the compiler auto-vectorizes the copy, the wrong data gets used. On Wed, 13 Feb 2013, ebotcazou at gcc dot gnu.org wrote:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494
>
> --- Comment #14 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-02-13 22:13:42 UTC ---
> > In LTO:
> >
> > /* If this variable belongs to the global constant pool, retrieve the
> > pre-computed RTL or recompute it in LTO mode. */
> > if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> > {
> > SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));
> > return;
> >
> > but obviously output_constant_def when just getting DECL_INITIAL cannot
> > honor any special alignment requirements of decl. It will simply get
> > a new decl with standard alignment.
> >
> > We know decl is already the decl associated with the constant, so we
> > should just re-use it.
>
> Why does the alignment of a DECL_IN_CONSTANT_POOL matter here exactly?
Because as it is seen as a regular VAR_DECL by optimizers the
vectorizer (well, IPA increase-alignment in this case) chooses to
bump its alignment to be able to use aligned vector moves.
Thus we need to honor the increased DECL_ALIGN when outputting the
constant pool, otherwise we generate wrong-code.
> Because as it is seen as a regular VAR_DECL by optimizers the
> vectorizer (well, IPA increase-alignment in this case) chooses to
> bump its alignment to be able to use aligned vector moves.
>
> Thus we need to honor the increased DECL_ALIGN when outputting the
> constant pool, otherwise we generate wrong-code.
I see, thanks. Perhaps the safest solution is to prevent IPA from increasing the alignment if DECL_IN_CONSTANT_POOL, as initializations of aggregate objects are presumably not supposed to happen in hot spots.
On Thu, 14 Feb 2013, ebotcazou at gcc dot gnu.org wrote:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494
>
> --- Comment #17 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-02-14 09:18:46 UTC ---
> > Because as it is seen as a regular VAR_DECL by optimizers the
> > vectorizer (well, IPA increase-alignment in this case) chooses to
> > bump its alignment to be able to use aligned vector moves.
> >
> > Thus we need to honor the increased DECL_ALIGN when outputting the
> > constant pool, otherwise we generate wrong-code.
>
> I see, thanks. Perhaps the safest solution is to prevent IPA from increasing
> the alignment if DECL_IN_CONSTANT_POOL, as initializations of aggregate objects
> are presumably not supposed to happen in hot spots.
It's also done by the vectorizer pass (for -fno-section-anchors).
I believe it's wrong to not honor DECL_ALIGN of decl in
/* If this variable belongs to the global constant pool, retrieve the
pre-computed RTL or recompute it in LTO mode. */
if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
{
SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));
return;
which is what happens here. Thus, if we say DECL_IN_CONSTANT_POOL
decls may not have their alignment changed we should assert that
(but my patch just honors DECL_ALIGN and avoids creation of yet
another DECL_IN_CONSTANT_POOL decl ...). After all we also
use DECL_ALIGN information if anybody inspects the address
(which may happen if we elide the local static to the initializer
if we can see it is never changed - I suppose we cannot do that
at the moment)
Btw, if properly aligned the block-move can use vector code as well
(not sure if any target does that though).
Richard.
Author: rguenth Date: Thu Feb 14 12:24:12 2013 New Revision: 196050 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196050 Log: 2013-02-14 Richard Biener <rguenther@suse.de> PR lto/50494 * varasm.c (output_constant_def_1): Get the decl representing the constant as argument. (output_constant_def): Wrap output_constant_def_1. (make_decl_rtl): Use output_constant_def_1 with the decl representing the constant. (build_constant_desc): Optionally re-use a decl already representing the constant. (tree_output_constant_def): Adjust. Modified: trunk/gcc/ChangeLog trunk/gcc/varasm.c Fixed for 4.8. Reopened, the fix breaks LTO bootstrap with Ada: /tmp/cc5Nhl9J.s: Assembler messages: /tmp/cc5Nhl9J.s:6206: Error: symbol `.LC0' is already defined /tmp/cc5Nhl9J.s:6219: Error: symbol `.LC0' is already defined /tmp/cc5Nhl9J.s:6996: Error: symbol `.LC0' is already defined make[4]: *** [/tmp/ccuFb8kg.ltrans2.ltrans.o] Error 1 make[4]: *** Waiting for unfinished jobs.... /tmp/ccf05Tt7.s: Assembler messages: /tmp/ccf05Tt7.s:5618: Error: symbol `.LC5' is already defined /tmp/ccf05Tt7.s:5629: Error: symbol `.LC5' is already defined make[4]: *** [/tmp/ccuFb8kg.ltrans8.ltrans.o] Error 1 lto-wrapper: make returned 2 exit status /home/eric/install/gcc/x86_64-suse-linux/bin/ld: lto-wrapper failed collect2: error: ld returned 1 exit status make[3]: *** [gnatbind] Error 1 make[3]: *** Waiting for unfinished jobs.... The reason for the name collision seems obvious enough... I still think the best solution is not to fiddle with the alignment of DECL_IN_CONSTANT_POOL objects at this point, since the machinery assumes that their alignment doesn't matter (or more precisely that they only need to have the alignment of their type, which is correct as far as I know). (In reply to comment #21) > Reopened, the fix breaks LTO bootstrap with Ada: > > /tmp/cc5Nhl9J.s: Assembler messages: > /tmp/cc5Nhl9J.s:6206: Error: symbol `.LC0' is already defined > /tmp/cc5Nhl9J.s:6219: Error: symbol `.LC0' is already defined > /tmp/cc5Nhl9J.s:6996: Error: symbol `.LC0' is already defined > make[4]: *** [/tmp/ccuFb8kg.ltrans2.ltrans.o] Error 1 > make[4]: *** Waiting for unfinished jobs.... > /tmp/ccf05Tt7.s: Assembler messages: > /tmp/ccf05Tt7.s:5618: Error: symbol `.LC5' is already defined > /tmp/ccf05Tt7.s:5629: Error: symbol `.LC5' is already defined > make[4]: *** [/tmp/ccuFb8kg.ltrans8.ltrans.o] Error 1 > lto-wrapper: make returned 2 exit status > /home/eric/install/gcc/x86_64-suse-linux/bin/ld: lto-wrapper failed > collect2: error: ld returned 1 exit status > make[3]: *** [gnatbind] Error 1 > make[3]: *** Waiting for unfinished jobs.... > > The reason for the name collision seems obvious enough... I still think the > best solution is not to fiddle with the alignment of DECL_IN_CONSTANT_POOL > objects at this point, since the machinery assumes that their alignment doesn't > matter (or more precisely that they only need to have the alignment of their > type, which is correct as far as I know). How can the patch cause a name collision when all the patch does is avoid creating a new decl...? They are static and thus should be mangled. Well, and if we want a new decl just to re-assign a unique name here then we want to copy over alignment information. That is, LTO should handle constant-pool entries by _not_ streaming the decl then. Honza, how are those supposed to make the symtab -> WPA -> LTRANS transition anyway? > How can the patch cause a name collision when all the patch does is > avoid creating a new decl...? They are static and thus should be > mangled. They clearly aren't. > Well, and if we want a new decl just to re-assign a unique name here > then we want to copy over alignment information. That is, LTO should > handle constant-pool entries by _not_ streaming the decl then. Honza, > how are those supposed to make the symtab -> WPA -> LTRANS transition > anyway? The irony being that the initial implementation didn't stream the DECL but was changed because the varpool was just starting to being streamed as well. :-) (In reply to comment #23) > > How can the patch cause a name collision when all the patch does is > > avoid creating a new decl...? They are static and thus should be > > mangled. > > They clearly aren't. > > > Well, and if we want a new decl just to re-assign a unique name here > > then we want to copy over alignment information. That is, LTO should > > handle constant-pool entries by _not_ streaming the decl then. Honza, > > how are those supposed to make the symtab -> WPA -> LTRANS transition > > anyway? > > The irony being that the initial implementation didn't stream the DECL but was > changed because the varpool was just starting to being streamed as well. :-) Yes, we do refer to the in the IL so we need to stream them (in some way). The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name mangling of !TREE_PUBLIC decls when streaming in the decl for the constant pool entries (or when computing the assembler name at some point). I suppose we never compute decl-assembler-name for the constant pool entries but emit them in a special way? At least build_constant_desc seems to create a raw SYMBOL_RER using the raw created label? But then, as we don't stream the constant-descs, the RTL should refer to unique labels (but DECL_NAME and the SYMBOL_REF symbol do not agree). Which means that eventually the following would fix it: Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 196451) +++ gcc/varasm.c (working copy) @@ -3124,6 +3124,11 @@ build_constant_desc (tree exp, tree decl else align_variable (decl, 0); } + /* If we already had a decl for this constant pool entry adjust its + label to be unique within this translation unit and to make it + consistent with the symbol-ref symbol we use below. */ + else + DECL_NAME (decl) = get_identifier (label); /* Now construct the SYMBOL_REF and the MEM. */ if (use_object_blocks_p ()) can you check that? Thanks. Btw, I cannot reproduce the issue with t.c: void bar (int *); void foo (void) { int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 }; bar (a); } t2.c void bar (int *); void baz (void) { int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 }; bar (a); } t3.c: void bar (int *a) { __builtin_printf ("%d\n", a[25]); } int main() { foo (); baz (); return 0; } and gcc t.c t2.c t3.c -flto I see three .LC0 being streamed in at WPA stage, shipped to a single LTRANS unit and there getting .LC0, .LC1, and .LC2 symbols by means of the existing build_constant_desc (they have all .LCO DECL_NAME decls, but hey - even that would be fixed by my suggested patch. Trying LTO Ada bootstrap now. But I really fail to see how it should break. > The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name > mangling of !TREE_PUBLIC decls when streaming in the decl for the constant > pool entries (or when computing the assembler name at some point). > I suppose we never compute decl-assembler-name for the constant pool entries > but emit them in a special way? Right, see make_decl_rtl: /* If this variable belongs to the global constant pool, retrieve the pre-computed RTL or recompute it in LTO mode. */ if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl)) { SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl), decl, 1)); return; } > At least build_constant_desc seems to create a raw SYMBOL_RER using the raw > created label? But then, as we don't stream the constant-descs, the RTL > should refer to unique labels (but DECL_NAME and the SYMBOL_REF symbol do not > agree). I think the problem is that, with your patch, the DECLs are not unified when they have the same DECL_INITIAL (decl), even if they have the same RTL in the end. On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494
>
> --- Comment #26 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-03-05 13:51:28 UTC ---
> > The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name
> > mangling of !TREE_PUBLIC decls when streaming in the decl for the constant
> > pool entries (or when computing the assembler name at some point).
> > I suppose we never compute decl-assembler-name for the constant pool entries
> > but emit them in a special way?
>
> Right, see make_decl_rtl:
>
> /* If this variable belongs to the global constant pool, retrieve the
> pre-computed RTL or recompute it in LTO mode. */
> if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> {
> SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl),
> decl, 1));
> return;
> }
>
> > At least build_constant_desc seems to create a raw SYMBOL_RER using the raw
> > created label? But then, as we don't stream the constant-descs, the RTL
> > should refer to unique labels (but DECL_NAME and the SYMBOL_REF symbol do not
> > agree).
>
> I think the problem is that, with your patch, the DECLs are not unified when
> they have the same DECL_INITIAL (decl), even if they have the same RTL in the
> end.
Hmm, but when I use the same contents for the two arrays in my simple
testcase I do get only a single .LC0 output referenced from two places.
We will end up sharing the same RTL for both (unmerged) DECLs - but
I don't see how this can be a problem? Maybe we fail to set
TREE_ASM_WRITTEN on the duplicate and output it anyway via other
mechanisms?
But I can reproduce the Ada LTO bootstrap failure ...
> Hmm, but when I use the same contents for the two arrays in my simple
> testcase I do get only a single .LC0 output referenced from two places.
> We will end up sharing the same RTL for both (unmerged) DECLs - but
> I don't see how this can be a problem? Maybe we fail to set
> TREE_ASM_WRITTEN on the duplicate and output it anyway via other
> mechanisms?
We simply fail to set TREE_ASM_WRITTEN on the DECL_INITIAL (decl) because it is not shared anymore. So probably:
Index: varasm.c
===================================================================
--- varasm.c (revision 196416)
+++ varasm.c (working copy)
@@ -3112,7 +3112,6 @@ build_constant_desc (tree exp, tree decl
LTO mode. Instead we set the flag that will be recognized in
make_decl_rtl. */
DECL_IN_CONSTANT_POOL (decl) = 1;
- DECL_INITIAL (decl) = desc->value;
/* ??? CONSTANT_ALIGNMENT hasn't been updated for vector types on most
architectures so use DATA_ALIGNMENT as well, except for strings. */
if (TREE_CODE (exp) == STRING_CST)
@@ -3125,6 +3124,8 @@ build_constant_desc (tree exp, tree decl
align_variable (decl, 0);
}
+ DECL_INITIAL (decl) = desc->value;
+
/* Now construct the SYMBOL_REF and the MEM. */
if (use_object_blocks_p ())
{
On Tue, 5 Mar 2013, rguenther at suse dot de wrote: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 > > --- Comment #27 from rguenther at suse dot de <rguenther at suse dot de> 2013-03-05 13:58:15 UTC --- > On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote: > > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 > > > > --- Comment #26 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-03-05 13:51:28 UTC --- > > > The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name > > > mangling of !TREE_PUBLIC decls when streaming in the decl for the constant > > > pool entries (or when computing the assembler name at some point). > > > I suppose we never compute decl-assembler-name for the constant pool entries > > > but emit them in a special way? > > > > Right, see make_decl_rtl: > > > > /* If this variable belongs to the global constant pool, retrieve the > > pre-computed RTL or recompute it in LTO mode. */ > > if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl)) > > { > > SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl), > > decl, 1)); > > return; > > } > > > > > At least build_constant_desc seems to create a raw SYMBOL_RER using the raw > > > created label? But then, as we don't stream the constant-descs, the RTL > > > should refer to unique labels (but DECL_NAME and the SYMBOL_REF symbol do not > > > agree). > > > > I think the problem is that, with your patch, the DECLs are not unified when > > they have the same DECL_INITIAL (decl), even if they have the same RTL in the > > end. > > Hmm, but when I use the same contents for the two arrays in my simple > testcase I do get only a single .LC0 output referenced from two places. > We will end up sharing the same RTL for both (unmerged) DECLs - but > I don't see how this can be a problem? Maybe we fail to set > TREE_ASM_WRITTEN on the duplicate and output it anyway via other > mechanisms? So in fact the issue must be that we _do_ share the constant pool entry but not the decl we use to refer to it. Which isn't very easily fixable (well, it _is_ fixable, but ... ) OTOH in assemble_variable we properly use /* If this symbol belongs to the tree constant pool, output the constant if it hasn't already been written. */ if (TREE_CONSTANT_POOL_ADDRESS_P (symbol)) { tree decl = SYMBOL_REF_DECL (symbol); if (!TREE_ASM_WRITTEN (DECL_INITIAL (decl))) output_constant_def_contents (symbol); return; } so any constant pool entry shouldn't be emitted multiple times ... > But I can reproduce the Ada LTO bootstrap failure ... So we can revert the part of the patch that ends up not creating a new decl but only transfer DECL_ALIGN. But then we still don't "merge" the decls we use to refer to the constants, so I wonder how creating less decls can fix this issue at all ... Note that merging constants but not decls also can end up creating bogusly aligned constants. In fact it seems to me that we need to arrange for the LTO path /* If this variable belongs to the global constant pool, retrieve the pre-computed RTL or recompute it in LTO mode. */ if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl)) { SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl), decl, 1)); return; } to never share a constant pool entry ... :/ Bah. A reduced testcase would be nice to have ... > So we can revert the part of the patch that ends up not creating > a new decl but only transfer DECL_ALIGN. But then we still don't > "merge" the decls we use to refer to the constants, so I wonder > how creating less decls can fix this issue at all ... That would be worse, DECL_ALIGN should _not_ be fiddled with for constant pool entries in the first place since the constant/DECL_INITIAL is shared. > Note that merging constants but not decls also can end up > creating bogusly aligned constants. In fact it seems to me > that we need to arrange for the LTO path > > /* If this variable belongs to the global constant pool, retrieve the > pre-computed RTL or recompute it in LTO mode. */ > if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl)) > { > SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl), > decl, 1)); > return; > } > > to never share a constant pool entry ... :/ We should simply not touch DECL_IN_CONSTANT_POOL variables, since they are not regular VAR_DECLs but only represent the underlying constant. On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 > > --- Comment #28 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-03-05 14:23:19 UTC --- > > Hmm, but when I use the same contents for the two arrays in my simple > > testcase I do get only a single .LC0 output referenced from two places. > > We will end up sharing the same RTL for both (unmerged) DECLs - but > > I don't see how this can be a problem? Maybe we fail to set > > TREE_ASM_WRITTEN on the duplicate and output it anyway via other > > mechanisms? > > We simply fail to set TREE_ASM_WRITTEN on the DECL_INITIAL (decl) because it is > not shared anymore. So probably: But in all places I found we check TREE_ASM_WRITTEN on DECL_INITIAL of the SYMBOL_REF_DECL ... So it must be pure luck that we survived LTO bootstrap before my patch (as far as I understand things). Before my patch we created yet another decl for the constant pool entry. With my patch we will have one less (we still have the decls from the constant pool entries that end up being shared in the LTRANS unit). So in the end I can agree to that my patch doesn't really fix the original issue fully. So we can as well revert it and instead avoid messing with alignment of the constant pool entries. But then the underlying issue with multiple decls refering to the same constant pool entry with LTO remains, it just happens that it isn't triggered anymore. > Index: varasm.c > =================================================================== > --- varasm.c (revision 196416) > +++ varasm.c (working copy) > @@ -3112,7 +3112,6 @@ build_constant_desc (tree exp, tree decl > LTO mode. Instead we set the flag that will be recognized in > make_decl_rtl. */ > DECL_IN_CONSTANT_POOL (decl) = 1; > - DECL_INITIAL (decl) = desc->value; > /* ??? CONSTANT_ALIGNMENT hasn't been updated for vector types on most > architectures so use DATA_ALIGNMENT as well, except for strings. */ > if (TREE_CODE (exp) == STRING_CST) > @@ -3125,6 +3124,8 @@ build_constant_desc (tree exp, tree decl > align_variable (decl, 0); > } > > + DECL_INITIAL (decl) = desc->value; > + > /* Now construct the SYMBOL_REF and the MEM. */ > if (use_object_blocks_p ()) > { Hmm, maybe. Then, why do we copy the constant in the first place ... Thus, Index: varasm.c =================================================================== --- varasm.c (revision 196462) +++ varasm.c (working copy) @@ -3087,7 +3087,7 @@ build_constant_desc (tree exp, tree decl int labelno; desc = ggc_alloc_constant_descriptor_tree (); - desc->value = copy_constant (exp); + desc->value = exp; /* Propagate marked-ness to copied constant. */ if (flag_mudflap && mf_marked_p (exp)) should be an "equivalent" fix. > But in all places I found we check TREE_ASM_WRITTEN on DECL_INITIAL > of the SYMBOL_REF_DECL ... Nope, maybe_output_constant_def_contents has: rtx symbol = XEXP (desc->rtl, 0); tree exp = desc->value; if (flag_syntax_only) return; if (TREE_ASM_WRITTEN (exp)) /* Already output; don't do it again. */ return; so the DECL_INITIAL of the SYMBOL_REF_DECL must be desc->value. > So it must be pure luck that we survived LTO bootstrap before my > patch (as far as I understand things). Before my patch we created > yet another decl for the constant pool entry. With my patch > we will have one less (we still have the decls from the constant > pool entries that end up being shared in the LTRANS unit). We use LTO on heavy Ada applications with an unmodified 4.7 compiler (modulo a patch for -g) so I don't think that luck has anything to do here. > So in the end I can agree to that my patch doesn't really fix > the original issue fully. So we can as well revert it and > instead avoid messing with alignment of the constant pool entries. That would be my preference. > Hmm, maybe. Then, why do we copy the constant in the first place ... > > Thus, > > Index: varasm.c > =================================================================== > --- varasm.c (revision 196462) > +++ varasm.c (working copy) > @@ -3087,7 +3087,7 @@ build_constant_desc (tree exp, tree decl > int labelno; > > desc = ggc_alloc_constant_descriptor_tree (); > - desc->value = copy_constant (exp); > + desc->value = exp; > > /* Propagate marked-ness to copied constant. */ > if (flag_mudflap && mf_marked_p (exp)) > > should be an "equivalent" fix. This call to copy_constant has been there for ages though. so a little bit of archeology would probably be in order before removing it. In the meantime, I've successfully bootstrapped my patchlet so we can also go for it. On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 > > --- Comment #32 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-03-05 15:32:15 UTC --- > > But in all places I found we check TREE_ASM_WRITTEN on DECL_INITIAL > > of the SYMBOL_REF_DECL ... > > Nope, maybe_output_constant_def_contents has: > > rtx symbol = XEXP (desc->rtl, 0); > tree exp = desc->value; > > if (flag_syntax_only) > return; > > if (TREE_ASM_WRITTEN (exp)) > /* Already output; don't do it again. */ > return; > > so the DECL_INITIAL of the SYMBOL_REF_DECL must be desc->value. Ah, ok ... too many smart TREE_ASM_WRITTEN bits around ... > > So it must be pure luck that we survived LTO bootstrap before my > > patch (as far as I understand things). Before my patch we created > > yet another decl for the constant pool entry. With my patch > > we will have one less (we still have the decls from the constant > > pool entries that end up being shared in the LTRANS unit). > > We use LTO on heavy Ada applications with an unmodified 4.7 compiler (modulo a > patch for -g) so I don't think that luck has anything to do here. Fine, so the above might be the only issue. > > So in the end I can agree to that my patch doesn't really fix > > the original issue fully. So we can as well revert it and > > instead avoid messing with alignment of the constant pool entries. > > That would be my preference. Which of course pessimizes code generation and probably causes some testsuite fallout for ppc (the original reported spurious fails). > > Hmm, maybe. Then, why do we copy the constant in the first place ... > > > > Thus, > > > > Index: varasm.c > > =================================================================== > > --- varasm.c (revision 196462) > > +++ varasm.c (working copy) > > @@ -3087,7 +3087,7 @@ build_constant_desc (tree exp, tree decl > > int labelno; > > > > desc = ggc_alloc_constant_descriptor_tree (); > > - desc->value = copy_constant (exp); > > + desc->value = exp; > > > > /* Propagate marked-ness to copied constant. */ > > if (flag_mudflap && mf_marked_p (exp)) > > > > should be an "equivalent" fix. > > This call to copy_constant has been there for ages though. so a little bit of > archeology would probably be in order before removing it. ;) Did that, it's there since forever - well, I traced it back to the point we only deferred string constants: 37459 jakub p = (struct deferred_string *) 37459 jakub xmalloc (sizeof (struct deferred_string)); 37459 jakub 37459 jakub p->exp = copy_constant (exp); 37459 jakub p->label = desc->label; I'm LTO bootstrapping desc->value = decl ? exp : copy_constant (exp); and doing a regular bootstrap with the copy_constant completely removed at the moment. Just curious ... > In the meantime, I've successfully bootstrapped my patchlet so we can also go > for it. I'm fine with that. As I concluded that the original fix didn't fix the alignment issue (well, not for all possible cases at least) reverting the original fix works for me as well. I'm working on a patch to avoid re-aligning constant pool entries. Created attachment 29587 [details]
Patch to restore LTO bootstrap with Ada + comment tweaks
OK, this is the patch I've tested, but it's your call.
Author: rguenth Date: Wed Mar 6 08:38:46 2013 New Revision: 196487 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196487 Log: 2013-03-06 Richard Biener <rguenther@suse.de> PR middle-end/50494 * tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Do not adjust alignment of DECL_IN_CONSTANT_POOL decls. Revert 2013-02-13 Richard Biener <rguenther@suse.de> PR lto/50494 * varasm.c (output_constant_def_1): Get the decl representing the constant as argument. (output_constant_def): Wrap output_constant_def_1. (make_decl_rtl): Use output_constant_def_1 with the decl representing the constant. (build_constant_desc): Optionally re-use a decl already representing the constant. (tree_output_constant_def): Adjust. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-data-refs.c trunk/gcc/varasm.c Fixed. |