Bug 59195 - C++ demangler handles conversion operator incorrectly
Summary: C++ demangler handles conversion operator incorrectly
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: other (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Cary Coutant
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-19 18:53 UTC by Cary Coutant
Modified: 2015-11-27 14:48 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-11-19 00:00:00


Attachments
Proposed patch (3.18 KB, patch)
2013-11-19 18:56 UTC, Cary Coutant
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Cary Coutant 2013-11-19 18:53:42 UTC
Consider this simple source:

  struct C {
    C(int i_) : i(i_) { }
    int i;
  };

  struct A {
    A() : i(0) { }
    template <typename U>
    explicit operator U();
    int i;
  };

  void foo()
  {
    A ai;
    C c = (C)ai;
  }

The conversion operator in A mangles to "_ZN1AcvT_I1CEEv", which
demangles to:

  A::operator C<C>()

(I think this would be better as "A::operator C()", but let's
leave that aside for now.)

That looks OK, but not if you dig deeper. The demangler is
parsing this name as follows:

  typed name
    qualified name
      name 'A'
      cast
	template
	  template parameter 0
	  template argument list
	    name 'C'
    function type
      argument list
  A::operator C<C>()

It's parsing the "T_" as a <template-template-param>, not a
<template-param>. It should really be parsing it as:

  typed name
    template
      qualified name
	name 'A'
	cast
	  template parameter 0
      template argument list
	name 'C'
    function type
      argument list
  A::operator C<C>()

The template argument list actually belongs with the
<qualified-name>, not with the template parameter.

The basic problem is an ambiguity in the grammar for mangled
names. Consider the following two derivations:

  (1) <nested-name>
      -> <template-prefix> <template-args>
      -> <prefix> <template-unqualified-name> <template-args>
      -> <unqualified-name> <template-unqualified-name>
                                                   <template-args>
      -> <source-name> <template-unqualified-name> <template-args>
      -> <source-name> <operator-name> <template-args>
      -> <source-name> cv <type> <template-args>
      -> <source-name> cv <template-template-param>
                                   <template-args> <template-args>

  (2) <nested-name>
      -> <template-prefix> <template-args>
      -> <prefix> <template-unqualified-name> <template-args>
      -> <unqualified-name> <template-unqualified-name>
                                                   <template-args>
      -> <source-name> <template-unqualified-name> <template-args>
      -> <source-name> <operator-name> <template-args>
      -> <source-name> cv <type> <template-args>
      -> <source-name> cv <template-param> <template-args>

When you get to the "T_", there's no way to know if it's a
<template-param> or a <template-template-param>. The parser in
cp-demangle.c, in cplus_demangle_type, looks ahead to
disambiguate the two. If it sees an 'I', it assumes that it's a
<template-template-param>, and it greedily consumes the
<template-args>.

In this particular case, it's wrong, but it gets the right answer
because of a hack in d_print_cast, which takes the template
arguments under the cast operator, and places them in scope
before resolving the "T_" reference.

It doesn't take much to break that hack, though. Let's throw in a
pointer:

  struct C {
    C(int i_) : i(i_) { }
    int i;
  };

  struct A {
    A() : i(0) { }
    template <typename U>
    explicit operator U*();
    int i;
  };

  void foo()
  {
    A ai;
    C* c = (C*)ai;
  }

Now we get the mangled name "_ZN1AcvPT_I1CEEv", which the
demangler fails on:

  typed name
    qualified name
      name 'A'
      cast
	pointer
	  template
	    template parameter 0
	    template argument list
	      name 'C'
    function type
      argument list
  Failed: _ZN1AcvPT_I1CEEv

This name ought to be parsed as follows:

  typed name
    template
      qualified name
	name 'A'
	cast
	  pointer
	    template parameter 0
      template argument list
	name 'C'
    function type
      argument list
  A::operator C*<C>()

(This incorrect parsing can also lead to some other subtle types
of failures, because substitutions can be numbered in the wrong
order. See the long example that started me on this investigation
at the end.)

Now let's add a real template template parameter:

  template <typename T>
  struct C {
    C(T i_) : i(i_) { }
    T i;
  };

  struct A {
    A() : i(0) { }
    template <template<typename U> class V>
    operator V<int>();
    int i;
  };

  void foo()
  {
    A ai;
    C<int> c = (C<int>)ai;
  }

The conversion operator is now "_ZN1AcvT_IiEI1CEEv", and the
demangler gives us this:

  typed name
    template
      qualified name
	name 'A'
	cast
	  template
	    template parameter 0
	    template argument list
	      builtin type int
      template argument list
	name 'C'
    function type
      argument list
  A::operator int<int><C>()

The derivation is actually correct this time, because we really
do have a <template-template-param>, but the hack in d_print_cast
causes it to substitute the wrong type for "T_".

I'm pretty sure that the ambiguity can be resolved by context.
Note that if we have a <template-template-param>, the
<template-args> that go with it are followed by another set of
<template-args> that go with the <template-prefix> in the second
step of the derivation. (This is only true if we're parsing a
conversion operator that derives from <nested-name>; if it's a
conversion operator that's part of an <expression>, the presence
of an 'I' should always indicate that we're looking at a
<template-template-param>.)

I've got a patch that solves this with a little backtracking. If
we're in a conversion operator that's not part of an
<expression>, and we see an 'I', it checkpoints the d_info
structure, does a trial parse of <template-args>, then looks for
another 'I'. If there isn't one, it backtracks and returns a
<template-param>, leaving the <template-args> for d_prefix to
parse.

When printing a conversion operator, in d_print_cast, if the
operator is derived from a <template>, we need to place that
template in scope so that the <template-param> can look up the
correct type. We can't just add it to the stack of templates,
though, since that's only applicable when printing the type of a
function whose name is a template.

Here's the long example that started me down this path:

  _ZNK7strings8internal8SplitterINS_9delimiter5AnyOfENS_9SkipEmptyEEcvT_ISt6vectorI12basic_stringIcSt11char_traitsIcESaIcEESaISD_EEvEEv

The demangler fails to parse this string, because the "SD_" near
the end is asking for the 15th substitution, but at that point it
has only registered 14 substitutions. That happened because it
greedily consumed the <template-args> that follow "cvT_", and
hadn't yet registered the substitution for the conversion
operator (which derives from <prefix>). Any substitutions that
are contained in the <template-args> will be off by one as a
result.
Comment 1 Cary Coutant 2013-11-19 18:56:01 UTC
Created attachment 31253 [details]
Proposed patch
Comment 2 Cary Coutant 2013-11-22 22:25:52 UTC
Author: ccoutant
Date: Fri Nov 22 22:25:49 2013
New Revision: 205292

URL: http://gcc.gnu.org/viewcvs?rev=205292&root=gcc&view=rev
Log:
Fix demangler to handle conversion operators correctly.

libiberty/
	PR other/59195
	* cp-demangle.c (struct d_info_checkpoint): New struct.
	(struct d_print_info): Add current_template field.
	(d_operator_name): Set flag when processing a conversion
	operator.
	(cplus_demangle_type): When processing <template-args> for
	a conversion operator, backtrack if necessary.
	(d_expression_1): Renamed from d_expression.
	(d_expression): New wrapper around d_expression_1.
	(d_checkpoint): New function.
	(d_backtrack): New function.
	(d_print_init): Initialize current_template.
	(d_print_comp): Set current_template.
	(d_print_cast): Put current_template in scope for
	printing conversion operator name.
	(cplus_demangle_init_info): Initialize is_expression and
	is_conversion.
	* cp-demangle.h (struct d_info): Add is_expression and
	is_conversion fields.
	* testsuite/demangle-expected: New test cases.

Modified:
    trunk/libiberty/ChangeLog
    trunk/libiberty/cp-demangle.c
    trunk/libiberty/cp-demangle.h
    trunk/libiberty/testsuite/demangle-expected
Comment 3 Cary Coutant 2014-02-25 19:58:31 UTC
Fixed at r205292.
Comment 4 Pedro Alves 2014-05-26 14:04:21 UTC
This caused bug 61321.
Comment 5 Pedro Alves 2014-05-26 14:05:04 UTC
Likely bug 61233 too.
Comment 6 Markus Trippelsdorf 2015-11-27 14:48:54 UTC
Author: trippels
Date: Fri Nov 27 14:48:21 2015
New Revision: 231020

URL: https://gcc.gnu.org/viewcvs?rev=231020&root=gcc&view=rev
Log:
PR other/61321 - demangler crash on casts in template parameters

The fix for bug 59195:

 [C++ demangler handles conversion operator incorrectly]
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59195

unfortunately makes the demangler crash due to infinite recursion, in
case of casts in template parameters.

For example, with:

 template<int> struct A {};
 template <typename Y> void function_temp(A<sizeof ((Y)(999))>) {}
 template void function_temp<int>(A<sizeof (int)>);

The 'function_temp<int>' instantiation above mangles to:

  _Z13function_tempIiEv1AIXszcvT_Li999EEE

The demangler parses this as:

typed name
  template
    name 'function_temp'
    template argument list
      builtin type int
  function type
    builtin type void
    argument list
      template                          (*)
        name 'A'
        template argument list
          unary operator
            operator sizeof
            unary operator
              cast
                template parameter 0    (**)
              literal
                builtin type int
                name '999'

And after the fix for 59195, due to:

 static void
 d_print_cast (struct d_print_info *dpi, int options,
	       const struct demangle_component *dc)
 {
 ...
   /* For a cast operator, we need the template parameters from
      the enclosing template in scope for processing the type.  */
   if (dpi->current_template != NULL)
     {
       dpt.next = dpi->templates;
       dpi->templates = &dpt;
       dpt.template_decl = dpi->current_template;
     }

when printing the template argument list of A (what should be "<sizeof
(int)>"), the template parameter 0 (that is, "T_", the '**' above) now
refers to the first parameter of the the template argument list of the
'A' template (the '*' above), exactly what we were already trying to
print.  This leads to infinite recursion, and stack exaustion.  The
template parameter 0 should actually refer to the first parameter of
the 'function_temp' template.

Where it reads "for the cast operator" in the comment in d_print_cast
(above), it's really talking about a conversion operator, like:

  struct A { template <typename U> explicit operator U(); };

We don't want to inject the template parameters from the enclosing
template in scope when processing a cast _expression_, only when
handling a conversion operator.

The problem is that DEMANGLE_COMPONENT_CAST is currently ambiguous,
and means _both_ 'conversion operator' and 'cast expression'.

Fix this by adding a new DEMANGLE_COMPONENT_CONVERSION component type,
which does what DEMANGLE_COMPONENT_CAST does today, and making
DEMANGLE_COMPONENT_CAST just simply print its component subtree.

I think we could instead reuse DEMANGLE_COMPONENT_CAST and in
d_print_comp_inner still do:

 @@ -5001,9 +5013,9 @@ d_print_comp_inner (struct d_print_info *dpi, int options,
        d_print_comp (dpi, options, dc->u.s_extended_operator.name);
        return;

     case DEMANGLE_COMPONENT_CAST:
       d_append_string (dpi, "operator ");
 -     d_print_cast (dpi, options, dc);
 +     d_print_conversion (dpi, options, dc);
       return;

leaving the unary cast case below calling d_print_cast, but seems to
me that spliting the component types makes it easier to reason about
the code.

g++'s testsuite actually generates three symbols that crash the
demangler in the same way.  I've added those as tests in the demangler
testsuite as well.

And then this fixes PR other/61233 too, which happens to be a
demangler crash originally reported to GDB, at:
https://sourceware.org/bugzilla/show_bug.cgi?id=16957

Bootstrapped and regtested on x86_64 Fedora 20.

Also ran this through GDB's testsuite.  GDB will require a small
update to use DEMANGLE_COMPONENT_CONVERSION in one place it's using
DEMANGLE_COMPONENT_CAST in its sources.

libiberty/
2015-11-27  Pedro Alves  <palves@redhat.com>

        PR other/61321
        PR other/61233
        * demangle.h (enum demangle_component_type)
        <DEMANGLE_COMPONENT_CONVERSION>: New value.
        * cp-demangle.c (d_demangle_callback, d_make_comp): Handle
        DEMANGLE_COMPONENT_CONVERSION.
        (is_ctor_dtor_or_conversion): Handle DEMANGLE_COMPONENT_CONVERSION
        instead of DEMANGLE_COMPONENT_CAST.
        (d_operator_name): Return a DEMANGLE_COMPONENT_CONVERSION
        component if handling a conversion.
        (d_count_templates_scopes, d_print_comp_inner): Handle
        DEMANGLE_COMPONENT_CONVERSION.
        (d_print_comp_inner): Handle DEMANGLE_COMPONENT_CONVERSION instead
        of DEMANGLE_COMPONENT_CAST.
        (d_print_cast): Rename as ...
        (d_print_conversion): ... this.  Adjust comments.
        (d_print_cast): Rewrite - simply print the left subcomponent.
        * cp-demint.c (cplus_demangle_fill_component): Handle
        DEMANGLE_COMPONENT_CONVERSION.

        * testsuite/demangle-expected: Add tests.

Modified:
    trunk/include/demangle.h
    trunk/libiberty/ChangeLog
    trunk/libiberty/cp-demangle.c
    trunk/libiberty/cp-demint.c
    trunk/libiberty/testsuite/demangle-expected