Bug 86490

Summary: lto1: fatal error: multiple prevailing defs with common symbols
Product: gcc Reporter: H.J. Lu <hjl.tools>
Component: ltoAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: minor CC: amonakov, hubicka, joe.harvell
Priority: P3 Keywords: ice-on-valid-code
Version: 9.0   
Target Milestone: ---   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=23350
https://sourceware.org/bugzilla/show_bug.cgi?id=23411
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2018-07-11 00:00:00

Description H.J. Lu 2018-07-11 15:52:27 UTC
[hjl@gnu-cfl-1 pr23350]$ cat foo.c
int foo;
void bar() {}
[hjl@gnu-cfl-1 pr23350]$ cat bar.c
int foo;
void bar() {}
[hjl@gnu-cfl-1 pr23350]$ cat main.c
int foo;
int
main ()
{
  return foo;
}
[hjl@gnu-cfl-1 pr23350]$ make CC=gcc
gcc -O2 -flto   -c -o main.o main.c
gcc -O2 -flto   -c -o foo.o foo.c
ar --plugin `gcc -print-prog-name=liblto_plugin.so` -rusc libfoo.a foo.o
gcc -O2 -flto   -c -o bar.o bar.c
ar --plugin `gcc -print-prog-name=liblto_plugin.so` -rusc libbar.a bar.o
gcc -o x  main.o libfoo.a libbar.a
lto1: fatal error: multiple prevailing defs for ‘bar’
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
compilation terminated.
/usr/local/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make: *** [Makefile:14: x] Error 1
[hjl@gnu-cfl-1 pr23350]$ 

Linker has:

     if (!blhe)
        {
          /* The plugin is called to claim symbols in an archive element
             from plugin_object_p.  But those symbols aren't needed to
             create output.  They are defined and referenced only within
             IR.  */
          switch (syms[n].def)
            {
            default:
              abort ();
            case LDPK_UNDEF:
            case LDPK_WEAKUNDEF:
              res = LDPR_UNDEF;
              break;
            case LDPK_DEF:
            case LDPK_WEAKDEF:
            case LDPK_COMMON:
              res = LDPR_PREVAILING_DEF_IRONLY;
              break;
            }
          goto report_symbol;
        }

lto1 shouldn't issue

lto1: fatal error: multiple prevailing defs for ‘bar’

unless bar will be included in output.
Comment 1 Martin Liška 2018-07-11 15:57:26 UTC
Thanks H.J. for investigation.
Honza can you please take a look?
Comment 2 Alexander Monakov 2018-07-11 16:31:41 UTC
Note that Gold does not exhibit this issue. I think ld.bfd is at fault here.

We've hit similar issues with some internal plugin development. The main issue is, ld.bfd feeds the plugin with objects extracted from static archives, but those objects do not satisfy any unresolved references and would not be extracted in the first place in non-LTO link. So ld.bfd is causing useless extra work both for itself and the compiler plugin.

It would be nice to fix this on ld.bfd side so future plugin writers don't need to wrestle with this issue.
Comment 3 H.J. Lu 2018-07-11 16:49:44 UTC
(In reply to Alexander Monakov from comment #2)
> Note that Gold does not exhibit this issue. I think ld.bfd is at fault here.

It is because gold doesn't check archive for a common definition.

> We've hit similar issues with some internal plugin development. The main
> issue is, ld.bfd feeds the plugin with objects extracted from static
> archives, but those objects do not satisfy any unresolved references and
> would not be extracted in the first place in non-LTO link. So ld.bfd is
> causing useless extra work both for itself and the compiler plugin.
> 

Is there a common symbol involved?
Comment 4 Alexander Monakov 2018-07-11 17:17:27 UTC
(In reply to H.J. Lu from comment #3)
> It is because gold doesn't check archive for a common definition.

Please elaborate - does ld.bfd try to extract static archive members when it already has a common definition? Why?

> Is there a common symbol involved?

I don't think so, but I'm not sure. We've also seen other pain points like the same member extracted and given to the plugin multiple times, even though the second extraction cannot possibly satisfy any unresolved references.
Comment 5 H.J. Lu 2018-07-11 18:06:29 UTC
(In reply to Alexander Monakov from comment #4)
> (In reply to H.J. Lu from comment #3)
> > It is because gold doesn't check archive for a common definition.
> 
> Please elaborate - does ld.bfd try to extract static archive members when it
> already has a common definition? Why?

When ld sees a common symbol, it will use a non-common definiton
in a library, .a or .so, to override it.

> > Is there a common symbol involved?
> 
> I don't think so, but I'm not sure. We've also seen other pain points like
> the same member extracted and given to the plugin multiple times, even
> though the second extraction cannot possibly satisfy any unresolved
> references.

Do you have a testcase?
Comment 6 Alexander Monakov 2018-07-11 19:39:24 UTC
(In reply to H.J. Lu from comment #5)
> When ld sees a common symbol, it will use a non-common definiton
> in a library, .a or .so, to override it.

This is surprising, is it documented somewhere? I don't think the ELF spec suggests something like that needs to happen.

> Do you have a testcase?

No, it would take some time to prepare.
Comment 7 H.J. Lu 2018-07-11 21:16:52 UTC
(In reply to Alexander Monakov from comment #6)
> (In reply to H.J. Lu from comment #5)
> > When ld sees a common symbol, it will use a non-common definiton
> > in a library, .a or .so, to override it.
> 
> This is surprising, is it documented somewhere? I don't think the ELF spec
> suggests something like that needs to happen.

It is to be consistent for common symbol linked against .a or .so.
Comment 8 Alexander Monakov 2018-07-11 23:17:07 UTC
(In reply to H.J. Lu from comment #7)
> It is to be consistent for common symbol linked against .a or .so.

That seems like a really strange reason because without --whole-archive there are other ways to arrive at an apparent "inconsistency", while with --whole-archive there's no need for special treatment as the "consistent" result is achieved automatically.
Comment 9 zenith432 2018-07-12 19:46:25 UTC
It is worth studying what gold is doing, because it's not just skipping the object files in the archives.

If you link with
gcc -flto -save-temps -fuse-ld=gold -o x main.o libfoo.a libbar.a

The res file shows resolutions only for main.o.

Now link with
gcc -flto -save-temps -fuse-ld=gold -u bar -o x main.o libfoo.a libbar.a

The res file shows resolutions for main.o and exactly one of libfoo.a or libbar.a.

Now add definitions as follows
void f1() {} to foo.c
void f2() {} to bar.c
so you can tell them apart.

Now link with
gcc -flto -save-temps -fuse-ld=gold -u f1 -o x main.o libfoo.a libbar.a
gcc -flto -save-temps -fuse-ld=gold -u f2 -o x main.o libfoo.a libbar.a

each time, the resolution file shows gold resolving just one of the .a files which was requested with the -u.

Now link with
gcc -flto -save-temps -fuse-ld=gold -u f1 -u f2 -o x main.o libfoo.a libbar.a
to link all in.
This does give an error, but it's a gold error for multiple defs, not an lto1 error for multiple prevailing defs.  Look at the res file you'll see resolutions for all three input files, but there is just one prevailing def for symbol bar - the other instance of bar gets resolution PREEMPTED_IR.
Comment 10 zenith432 2018-07-13 17:08:47 UTC
Followup on what gold does...

First, it reads the symbol table from the archive (w/o using the plugin) - and if it doesn't need any of the symbols in an LTO member of the archive - it doesn't call the plugin's claim_file_handler on the member.

Second, even if it needs an LTO member from an archive - it first adds all the LTO object's symbols to its own symbol table during the add_symbols callback from the plugin.  For each symbol, it remembers which object file it first was seen in and whether that first object file is claimed by a plugin.
Later, when get_symbols callback is called from the plugin to get resolutions - it sets LDPR_PREVAILING_DEF_IRONLY for symbols it doesn't need, but only if the symbol was first seen in the same object file.  If it was first seen in another object file, it sets the resolution to either LDPR_PREEMPTED_IR or LDPR_PREEMPTED_REG, depending on whether the symbol's first source is claimed by a plugin or not.
This algorithm makes sure each IRONLY symbol only gets a single PREVAILING_DEF.
Comment 11 H.J. Lu 2018-07-13 17:56:28 UTC
(In reply to zenith432 from comment #10)
> Followup on what gold does...

This is a gold bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=23411
Comment 12 zenith432 2018-07-13 18:50:33 UTC
Fair enough, it's a gold bug in the sense that gold's algorithm for selecting a prevailing def among multiple defs has an error.
If an IR symbol has multiple definitions as
LDPK_COMMON
and a single definition as
LDPK_DEF
then the one marked as LDPK_DEF should be resolved as LDPR_PREVAILING_DEF_IRNONLY and the other defintions as PREEMPTED_IR.

However, the limitation of a single prevailing def is documented...
http://gcc.gnu.org/wiki/whopr/driver
In the subsection
The "All Symbols Read" Event
"In the case of a symbol that is defined in more than one IR file, WPA will need to know which definition to use and which definitions to ignore."

Logically, it is the linker's job to make this decision, because the prevailing def depends on the order that object files are given on the command line, on whether the object file has to be included in the link (i.e. it's on the command line) - or it's in a library and is optional.  And also on whether it's a common def or not.  lto1 cannot make this decision by itself.

It is not possible to leave multiple prevailing defs for lto1 just because the linker doesn't need the symbol and considers it discardable - because an IR symbol may be referenced from inside the IR by another part needed in the link.  In that case lto1 will need to generate the multiply defined symbol, and can't decide which one to use - for example because it doesn't know the order of the object files in the libraries given on the linker command line.
Comment 13 H.J. Lu 2018-07-13 19:14:15 UTC
(In reply to zenith432 from comment #12)
> Fair enough, it's a gold bug in the sense that gold's algorithm for
> selecting a prevailing def among multiple defs has an error.
> If an IR symbol has multiple definitions as
> LDPK_COMMON
> and a single definition as
> LDPK_DEF
> then the one marked as LDPK_DEF should be resolved as
> LDPR_PREVAILING_DEF_IRNONLY and the other defintions as PREEMPTED_IR.
> 
> However, the limitation of a single prevailing def is documented...
> http://gcc.gnu.org/wiki/whopr/driver
> In the subsection
> The "All Symbols Read" Event
> "In the case of a symbol that is defined in more than one IR file, WPA will
> need to know which definition to use and which definitions to ignore."
> 
> Logically, it is the linker's job to make this decision, because the
> prevailing def depends on the order that object files are given on the
> command line, on whether the object file has to be included in the link
> (i.e. it's on the command line) - or it's in a library and is optional.  And
> also on whether it's a common def or not.  lto1 cannot make this decision by
> itself.
> 
> It is not possible to leave multiple prevailing defs for lto1 just because
> the linker doesn't need the symbol and considers it discardable - because an
> IR symbol may be referenced from inside the IR by another part needed in the
> link.  In that case lto1 will need to generate the multiply defined symbol,
> and can't decide which one to use - for example because it doesn't know the
> order of the object files in the libraries given on the linker command line.

But the symbol in question won't be USED by lto1 at all.
Comment 14 zenith432 2018-07-14 08:34:07 UTC
(In reply to H.J. Lu from comment #13)
> 
> But the symbol in question won't be USED by lto1 at all.

Ok.  I didn't completely check the logic for resolutions in ld.bfd so didn't understand that it *knows* the symbol won't be used.

If ld knows a symbol in the IR won't be used and wants to trick lto1 into discarding the symbol - it can do so by setting the resolution to LDPR_PREEMPTED_REG.  lto1 has no way of verifying whether the symbol is defined outside the IR or not - so will simply respond to this resolution by discarding the symbol.

There is an example of this in gold in
Pluginobj::get_symbol_resolution_info

>  if (static_cast<size_t>(nsyms) > this->symbols_.size())
>    {
>      // We never decided to include this object. We mark all symbols as
>      // preempted.
>      gold_assert(this->symbols_.size() == 0);
>      for (int i = 0; i < nsyms; i++)
>        syms[i].resolution = LDPR_PREEMPTED_REG;
>      return version > 2 ? LDPS_NO_SYMS : LDPS_OK;
>    }

I did not completely follow the gold code as to why it may decide not to include the object, but if gold decides not to include the object after it's been claimed - this is how it gets all its symbols to be discarded by lto1.

Note that there are cases of multiple defs in the IR of an unused symbol where the linker still has to stop with an error.  For example - if the duplicate def is a regular kind (non-common, non-weak) and the obj files all appear on the command-line (not archive) - this is a duplicate symbol error even if the symbol is unreferenced.  The linker can either print the error itself - or leave multiple prevailing defs for lto1 to print the error :)
Comment 15 joe.harvell 2020-05-26 16:01:37 UTC
I get the following error on on gcc 8.3.0 using the default bfd linker with LTO, but only with binutils 2.27 and 2.34.  This error does not occur with binutils 2.25.  I also get this error with gcc 10.1.0 and binutils 2.34. (It's not possible to build gcc 10.1.0 correctly with binutils 2.25, so this is not tested).  In all cases, there is no error when using the gold linker.

lto1: fatal error: multiple prevailing defs for ‘actual_function_name_redacted’

This function is defined in one .c file and declared with the extern keyword in another .c file.

For the gcc 10.1.0 / binutils 2.34 case, here is what I see using lto-dump:

jharvell@grays obj$ /opt/gcc10.1/bin/lto-dump -symbol=actual_function_name_redacted actual_c_filename_redacted_v1.o 
Symbol: actual_function_name_redacted
actual_function_name_redacted/259 (actual_function_name_redacted) @0x7fedffb7c168
  Type: function definition analyzed
  Visibility: externally_visible public
  References: 
  Referring: 
  Read from file: actual_c_filename_redacted_v1.o
  Availability: available
  Unit id: 1
  Function flags: count:1073741824 (estimated locally)
  Called by: 
  Calls: memcpy/260 (354334800 (estimated locally),0.33 per call) 

jharvell@grays obj$ /opt/gcc10.1/bin/lto-dump -symbol=actual_function_name_redacted actual_c_filename_redacted.o 
Symbol: actual_function_name_redacted
Comment 16 joe.harvell 2020-05-26 16:13:27 UTC
Correction on the previous comment.  I said

This function is defined in one .c file and declared with the extern keyword in another .c file.

But in fact the extern declaration in the second .c file is removed by the pre-processor.  So the symbol only occurs in one .c file.
Comment 17 joe.harvell 2020-05-26 16:46:24 UTC
It looks like this function is never called, by the way.
Comment 18 joe.harvell 2020-05-26 21:41:19 UTC
Sorry for the chatter...but I noticed the link command line had the same .o/.a files multiple times (to satisfy order dependencies between them).  When I removed the duplicates and enclosed them in -Wl,--start-group / -Wl,--end-group everything linked fine with bfd 2.34.  Note that bfd >= 2.27 also works fine if not using LTO.
Comment 19 Andrew Pinski 2021-12-24 11:43:50 UTC
The first example still fails if used with ld from binutils 2.30 and with -fcommon (which is not the default any more).

Moving down to a minor issue as -fno-common is now default since GCC 10.