This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Potential GC problem in preprocessed C++

I've just spend a number of hours tracking a problem that showed up
with a largish preprocessed C++ testcase provided by a customer.  Your
typical memory-corruption problem, except that this one wasn't really:
it was actually a garbage collection problem, but a very unlikely one,
I believe.

It's no longer present in CVS HEAD nor in the 3.0 branch (and
certainly nor in 2.95, if my recollection that we didn't use garbage
collection back then is correct).  However, it's not present because
it was fixed, but because other changes have managed to hide it so
that it can't possibly show up any longer.  I'd like to fix it for
real, so as to prevent it from ever re-surfacing, given how painful it
was to find out the root of the problem, and then to create a reduced
testcase that, with some help of magic, could be used to duplicate the
problem on other machines.

Back in the old days when filenames were GC-able objects, when the
lexer encountered a preprocessor directive such as:

# <line-number> "filename"

it would set input_filename to a GC-able string containing the
filename.  At garbage collection time, it would correctly mark
input_filename, as well as main_input_filename and any other filenames
in the include stack, such that they wouldn't be collected even if no
tree had been created that referenced the filename yet.

But not if the preprocessor directive came right before the closing
bracket of a class definition, and this class had a number of peculiar
characteristics such as:

- having no implicitly-defined members, because these would be
  considered to have been implicitly defined on the line of the
  closing bracket, and thus they'd hold a reference to the filename

- having at least one non-static data member, otherwise (back then) an
  implicit 1-byte data member would be created similarly

- having at least one member function defined inside the class body

The problem with member functions defined inside the class body is
that their bodies are saved while the class body is parsed, and are
only parsed for real after the class has been completely parsed.  In
order to do it, we put the current input_filename aside, and restore
the one in place at the point of the definition of the member

If garbage collection strikes in while the current input_filename is
put aside, given all of the above conditions, there will be no live
reference to input_filename, so it won't be marked, and will be
considered available.

`But this will never happen,' one might say`, since we only call
ggc_collect() in cp/spew.c:begin_parsing_inclass_inline() before
calling feed_input().'  Unfortunately, this is not true.

If the member function defined inside the class body happens to be a
template member function, cp/semantics.c:expand_body() *will* call
ggc_collect() while input_filename is put aside, and if you're unlucky
enough to have it kick in at this very point, it will consider the
original input_filename dead.  Later on, input_filename will be
restored, but then it may point to garbage.  In fact, if you're
unlucky, another tree will have been assigned to the same memory
location, and this tree will be chained to another tree only
referenced from this one.  Then, in the next round of garbage
collection, this tree will be preserved but, if it happens to be
marked as a string, not as a tree, the chain and anything else only
referenced from that tree won't be marked, and will be garbage
collected.  You know the way things go down the hill from there :-)

Anyway, we have two different problems here.  One, I'm not sure
whether it's a real problem, is that ggc_collect() is called when
we're not in the top level, such that it may possibly discard stuff
that is not referenced from any GC roots.  The other is that cp/spew.c
doesn't take any measure to mark filenames, should they be in GCable

I'm well aware the way we handle filenames and strings has changed a
lot since back then.  But still, I'd like to add notes somewhere about
this caveat if someone ever comes up with the idea of making filename
strings GCable again.  It's unlikely, and of little gain since most
often you'll #include a file such that you can get declarations from
it, and declarations would keep the filenames from being collected.
But still, I'd like to fix this potential problem somehow.

I see a few non-exclusive alternatives:

- modify expand_body() such that it doesn't invoke the garbage
  collector after parsing a template function defined inside a class
  body.  It's pointless, since we're going to invoke ggc_collect()
  when we get to the next deferred inline-defined function, or when we
  get to the end of the class definition

- add cp/spew.c:feed as a GC-root, and re-introduce string-marking
  do-nothing functions, that would be modified should strings ever be
  made GCable again, or introduce some test to tell whether a string
  is GCable and use it to tell whether to add the put-aside
  input_filename as a until we get back to it.

- add a note to stringpool.c:ggc_alloc_string() and perhaps to
  wherever filename strings are allocated these days (I couldn't
  figure it out after studying cpp*.c for a short while) such that one
  doesn't forget to do or enable any of the changes above should the
  GCability of filenames in general, and of input_filename in
  particular, change.

Comments?  Preferences?

For the interested ones, here's the source file I used to duplicate
the problem:

# 2 "filename" 1
struct foo {
  int i;
  foo(const foo&);
  foo& operator=(const foo&);
  template <class T> static void f() {}
# 6 "filename"

If input_filename contains a GCable string and the address of the
second occurrence of "filename" is different from that of the first
occurrence, if you set a breakpoint in expand_body(), step into
ggc_collect, set G.allocated_last_gc=0 such that we don't skip GC at
this time, you'll see the in_use_p bit corresponding to feed->filename
will not be marked by the end of ggc_collect().

Unfortunately, I can't think of a reasonable way to turn this into a
testcase, especially into one that would fail hard should the problem
ever be re-introduced.  Suggestions?

Alexandre Oliva   Enjoy Guarana', see
Red Hat GCC Developer                  aoliva@{,}
CS PhD student at IC-Unicamp        oliva@{,}
Free Software Evangelist    *Please* write to mailing lists, not to me

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]