Bug 60731 - [4.7/4.8/4.9 Regression] dynamic library not getting reinitialized on multiple calls to dlopen()
Summary: [4.7/4.8/4.9 Regression] dynamic library not getting reinitialized on multipl...
Status: RESOLVED MOVED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: unknown
: P3 normal
Target Milestone: 4.7.4
Assignee: Jason Merrill
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-01 17:01 UTC by Tim Moloney
Modified: 2015-01-15 15:03 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.7
Known to fail: 4.5.4, 4.8.2, 4.9.0
Last reconfirmed: 2014-04-02 00:00:00


Attachments
Example showing failure to initialize a dynamic library after multiple calls to dlopen(). (756 bytes, application/x-gzip)
2014-04-01 17:01 UTC, Tim Moloney
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Moloney 2014-04-01 17:01:36 UTC
Created attachment 32518 [details]
Example showing failure to initialize a dynamic library after multiple calls to dlopen().

If a dynamic library is loaded multiple times via dlopen(), subsequent loads do not correctly initialize static variables under the following conditions:
1) A class/struct with a constructor
2) with an inlined function
3) containing a static variable.
Please run the attached example.
  # tar xf gcc_static_issue.tgz
  # cd gcc_static_issue
  # make
  # ./test_static
  Expected behavior (as on RHEL5, g++ 4.1.2):
  Type 'q' to exit or enter to reload/run the DLL
  count:1
  
  count:1
  
  count:1
  
  count:1
  
  count:1
  q
  #
Actual behavior (as on RHEL6 and RHEL7beta, g++ 4.4.7 and 4.8.2, respectively):
  Type 'q' to exit or enter to reload/run the DLL
  count:1
  
  count:2
  
  count:3
  
  count:4
  
  count:5
  q
  #
Comment 1 Richard Biener 2014-04-02 09:00:35 UTC
Works up to GCC 4.4, fails since GCC 4.5.  It's not clear what makes the difference here.

Btw, with LD_DEBUG=all I see


     10389:     opening file=./static.so [0]; direct_opencount=2
     10389:
     10389:     symbol=routine;  lookup in file=./static.so [0]
     10389:     binding file ./static.so [0] to ./static.so [0]: normal symbol `routine'
count:2

     10389:     opening file=./static.so [0]; direct_opencount=3
     10389:
     10389:     symbol=routine;  lookup in file=./static.so [0]
     10389:     binding file ./static.so [0] to ./static.so [0]: normal symbol `routine'
count:3

so the dlclose call does nothing.  While in the working case:

     10438:
     10438:     file=./static.so [0];  dynamically loaded by ./test_static [0]
     10438:     file=./static.so [0];  generating link map
     10438:       dynamic: 0x00007ffff6ffede0  base: 0x00007ffff6dfd000   size: 0x00000000002020a8
     10438:         entry: 0x00007ffff6dfdb10  phdr: 0x00007ffff6dfd040  phnum:                  7
....
     10438:     calling init: ./static.so
     10438:
     10438:     opening file=./static.so [0]; direct_opencount=1
     10438:
     10438:     symbol=routine;  lookup in file=./static.so [0]
     10438:     binding file ./static.so [0] to ./static.so [0]: normal symbol `routine'
count:1
     10438:
     10438:     calling fini: ./static.so [0]
     10438:
     10438:
     10438:     file=./static.so [0];  destroying link map
Comment 2 Richard Biener 2014-04-02 09:49:12 UTC
We hit

void
_dl_close (void *_map)
{
  struct link_map *map = _map;

  /* First see whether we can remove the object at all.  */
  if (__builtin_expect (map->l_flags_1 & DF_1_NODELETE, 0))
    {
      assert (map->l_init_called);
      /* Nope.  Do nothing.  */
      return;

the DF_1_NODELETE flag is set already after the first dlopen call which sets
it via do_lookup_x for the STB_GNU_UNIQUE symbol _ZGVZ16make_static_stayvE3smp

                  if (map->l_type == lt_loaded)
                    /* Make sure we don't unload this object by
                       setting the appropriate flag.  */
                    ((struct link_map *) map)->l_flags_1 |= DF_1_NODELETE;

so this either points to a "bad" design on the guard code for initializing
'smp' or to a weakness in the dynamic loader which doesn't handle unloading
of objects which define any(?) STB_GNU_UNIQUE symbol.  Note the above is
guarded with

              if ((type_class & ELF_RTYPE_CLASS_COPY) != 0)
                enter (entries, size, new_hash, strtab + sym->st_name, ref,
                       undef_map);
              else
                {
                  enter (entries, size, new_hash, strtab + sym->st_name, sym,
                         map);

                  if (map->l_type == lt_loaded)
                    /* Make sure we don't unload this object by
                       setting the appropriate flag.  */
                    ((struct link_map *) map)->l_flags_1 |= DF_1_NODELETE;
                }

thus if this were referenced via a copy relocation it would work.

Jason?
Comment 3 Jason Merrill 2014-04-02 18:30:32 UTC
Right, it was a deliberate choice in ld.so to suppress dlclose of DSOs that use STB_GNU_UNIQUE, which causes problems with some code that relies on reinitialization with dlclose/dlopen.  As Ian says in

http://gcc.gnu.org/ml/gcc-help/2011-05/msg00450.html

this seems excessive; you only need to avoid unloading files that are satisfying symbol references in another DSO.  But I guess checking for that was deemed too slow.

If you're using the gold linker, you can link with --no-gnu-unique to avoid the use of STB_GNU_UNIQUE.

I suppose I should add a compiler flag to turn it off, too...
Comment 4 Richard Biener 2014-04-03 09:48:01 UTC
Thus this is a bug in the dynamic loader as well.  Please file a bug against
glibc on sourceware.org/bugzilla.
Comment 5 Richard Biener 2014-04-03 11:32:34 UTC
And actually it might be considered a non-bug in GCC but a consequence of implementing a requirement.  Jason posted a patch that implements a workaround
for the dynamic linker issue.

Closing as moved - please open a bugreport against glibc.
Comment 6 Tim Moloney 2014-04-03 13:41:25 UTC
I created glibc bug #16805 (https://sourceware.org/bugzilla/show_bug.cgi?id=16805).
Comment 7 Jason Merrill 2014-04-07 13:28:10 UTC
Author: jason
Date: Mon Apr  7 13:27:39 2014
New Revision: 209186

URL: http://gcc.gnu.org/viewcvs?rev=209186&root=gcc&view=rev
Log:
	PR c++/60731
	* common.opt (-fno-gnu-unique): Add.
	* config/elfos.h (USE_GNU_UNIQUE_OBJECT): Check it.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common.opt
    trunk/gcc/config/elfos.h
    trunk/gcc/doc/invoke.texi
Comment 8 Jason Merrill 2014-04-07 13:28:16 UTC
Author: jason
Date: Mon Apr  7 13:27:45 2014
New Revision: 209187

URL: http://gcc.gnu.org/viewcvs?rev=209187&root=gcc&view=rev
Log:
	PR c++/60731
	* lib/gcc-dg.exp (dg-build-dso): New.
	(gcc-dg-test-1): Handle dg-do-what "dso".
	* lib/target-supports.exp (add_options_for_dlopen): New.
	(check_effective_target_dlopen): Use it.
	* g++.dg/dso/dlclose1.C: New.
	* g++.dg/dso/dlclose1-dso.cc: New.

Added:
    trunk/gcc/testsuite/g++.dg/dso/
    trunk/gcc/testsuite/g++.dg/dso/dlclose1-dso.cc
    trunk/gcc/testsuite/g++.dg/dso/dlclose1.C
Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/gcc-dg.exp
    trunk/gcc/testsuite/lib/target-supports.exp
Comment 9 Ondrej Bilka 2014-04-23 09:09:21 UTC
I started to looking what STB_UNIQUE purpose is so I have several questions.

First does suggestion below really work?
http://gcc.gnu.org/ml/gcc-help/2011-05/msg00450.html

Say you have foo.so with unique symbol foo and function 

bar *getfoo() {
  return (void *) &foo;
}

which gets loaded and unloaded like

void *h = dlopen("foo.so",RTLD_NOW);
bar *p1 = dlsym(h,"getfoo")();
dlclose(h);
foo->baz();
h = dlopen("foo.so",RTLD_NOW);
bar *p2 = dlsym(h,"getfoo")();
dlclose(h);

Are p1 and p2 supposed to point to same object?
Also is foo->baz(); legal or not? If its so we cannot call destructors.

There could be fix to add zombie state where we call destructors but not free memory so we can reinitialize object at same address but I need to know that calling destructor is always intended behaviour.
Comment 10 Jason Merrill 2014-05-05 16:04:50 UTC
(In reply to Ondrej Bilka from comment #9)
> First does suggestion below really work?
> http://gcc.gnu.org/ml/gcc-help/2011-05/msg00450.html

I don't see why it wouldn't.
 
> void *h = dlopen("foo.so",RTLD_NOW);
> bar *p1 = dlsym(h,"getfoo")();
> dlclose(h);
> foo->baz();
> h = dlopen("foo.so",RTLD_NOW);
> bar *p2 = dlsym(h,"getfoo")();
> dlclose(h);
> 
> Are p1 and p2 supposed to point to same object?
> Also is foo->baz(); legal or not? If its so we cannot call destructors.

I don't think a program can reasonably rely on either of these.
 
> There could be fix to add zombie state where we call destructors but not
> free memory so we can reinitialize object at same address but I need to know
> that calling destructor is always intended behaviour.

That sounds fine to me.
Comment 11 Dave Johansen 2014-06-12 21:55:36 UTC
Can this please be reopened? It was determined in the glibc bugzilla that this is a gcc problem because of the incorrect setting of unique flag.
Comment 12 Jason Merrill 2014-06-13 16:19:34 UTC
(In reply to Dave Johansen from comment #11)
> Can this please be reopened? It was determined in the glibc bugzilla that
> this is a gcc problem because of the incorrect setting of unique flag.

The setting is not incorrect, nor is it an optimization; it is necessary to fix the behavior of RTLD_LOCAL with multiple loaded objects depending on the same library, since the glibc developers rejected the other approach that I suggested (https://www.sourceware.org/ml/libc-alpha/2002-05/msg00222.html).

If you don't need this handling, in 4.9 you can use -fno-gnu-unique to disable it.  I'll go ahead and backport that switch to 4.8 as well.
Comment 13 Jason Merrill 2014-06-13 16:40:09 UTC
Author: jason
Date: Fri Jun 13 16:39:37 2014
New Revision: 211648

URL: https://gcc.gnu.org/viewcvs?rev=211648&root=gcc&view=rev
Log:
	PR c++/60731
	* common.opt (-fno-gnu-unique): Add.
	* config/elfos.h (USE_GNU_UNIQUE_OBJECT): Check it.

Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/common.opt
    branches/gcc-4_8-branch/gcc/config/elfos.h
    branches/gcc-4_8-branch/gcc/doc/invoke.texi
Comment 14 Dave Johansen 2015-01-15 15:03:01 UTC
Could you please point me to how I can reproduce the issue with "RTLD_LOCAL with multiple loaded objects depending on the same library"? I would like to see if I can reproduce that issue with clang++ and icpc.
Thanks,
Dave