[PATCH] c++: error message for dependent template members [PR70417]

Fri Aug 27 21:33:11 GMT 2021

On 8/20/21 12:56 PM, Anthony Sharp via Gcc-patches wrote:
> Hi, hope everyone is well. I have a patch here for issue 70417
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70417). I'm still a GCC
> noob, and this is probably the hardest thing I have ever coded in my
> life, so please forgive any initial mistakes!
> 
> TLDR
> 
> This patch introduces a helpful error message when the user attempts
> to put a template-id after a member access token (., -> or ::) in a
> dependent context without having put the "template" keyword after the
> access token.

Great, thanks!

> CONTEXT
> 
> In C++, when referencing a class member (using ., -> or ::) in a
> dependent context, if that member is a template, the template keyword
> is required after the access token. For example:
> 
> template<class T> void MyDependentTemplate(T t)
> {
>    t.DoSomething<T>(); /* Error - t is dependent. "<" is treated as a
> less-than sign because DoSomething is assumed to be a non-template
> identifier. */
>    t.template DoSomething<T>(); // Good.
> }
> 
> PATCH EXPLANATION
> 
> In order to throw an appropriate error message we need to identify
> and/or create a point in the compiler where the following conditions
> are all simultaneously satisfied:
> 
> A) a class member access token (., ->, ::)
> B) a dependent context
> C) the thing being accessed is a template-id
> D) No template keyword
> 
> A, B and D are relatively straightforward and I won't go into detail
> about how that was achieved. The real challenge is C - figuring out
> whether a name is a template-id.
> 
> Usually, we would look up the name and use that to determine whether
> the name is a template or not. But, we cannot do that. We are in a
> dependent context, so lookup cannot (in theory) find anything at the
> point the access expression is parsed.
> 
> We (maybe) could defer checking until the template is actually
> instantiated. But this is not the approach I have gone for, since this
> to me sounded unnecessarily awkward and clunky. In my mind this also
> seems like a syntax error, and IMO it seems more natural that syntax
> errors should get caught as soon as they are parsed.
> 
> So, the solution I went for was to figure out whether the name was a
> template by looking at the surrounding tokens. The key to this lies in
> being able to differentiate between the start and end of a template
> parameter list (< and >) and greater-than and less-than tokens (and
> perhaps also things like << or >>). If the final > (if we find one at
> all) does indeed mark the end of a class or function template, then it
> will be followed by something like (, ::, or even just ;. On the other
> hand, a greater-than operator would be followed by a name.
> 
> PERFORMANCE IMPACT
> 
> My concern with this approach was that checking this would actually
> slow the compiler down. In the end it seemed the opposite was true. By
> parsing the tokens to determine whether the name looks like a
> template-id, we can cut out a lot of the template-checking code that
> gets run trying to find a template when there is none, making compile
> time generally faster. That being said, I'm not sure if the
> improvement will be that substantial, so I did not feel it necessary
> to introduce the template checking method everywhere where we could
> possibly have run into a template.
> 
> I ran an ABAB test with the following code repeated enough times to
> fill up about 10,000 lines:
> 
> ai = 1;
> bi = 2;
> ci = 3;
> c.x = 4;
> (&c)->x = 5;
> mytemplateclass<Y>::y = 6;
> c.template class_func<Y>();
> (&c)->template class_func<Y>();
> mytemplateclass<Y>::template class_func_static<Y>();
> c2.class_func<myclass>();
> (&c2)->class_func<myclass>();
> myclass::class_func_static<myclass>();
> ai = 6;
> bi = 5;
> ci = 4;
> c.x = 3;
> (&c)->x = 2;
> mytemplateclass<Y>::y = 1;
> c.template class_func<Y>();
> (&c)->template class_func<Y>();
> mytemplateclass<Y>::template class_func_static<Y>();
> c2.class_func<myclass>();
> (&c2)->class_func<myclass>();
> myclass::class_func_static<myclass>();
> 
> It basically just contains a mixture of class access expressions plus
> some regular assignments for good measure. The compiler was compiled
> without any optimisations (and so slower than usual). In hindsight,
> perhaps this test case assumes the worst-ish-case scenario since it
> contains a lot of templates; most of the benefits of this patch occur
> when parsing non-templates.
> 
> The compiler (clean) and the compiler with the patch applied (patched)
> compiled the above code 20 times each in ABAB fashion. On average, the
> clean compiler took 2.26295 seconds and the patched compiler took
> 2.25145 seconds (about 0.508% faster). Whether this improvement
> remains or disappears when the compiler is built with optimisations
> turned on I haven't tested. My main concern was just making sure it
> didn't get any slower.
> 
> AWKWARD TEST CASES
> 
> You will see from the patch that it required the updating of a few
> testcases (as well as one place within the compiler). I believe this
> is because up until now, the compiler allowed the omission of the
> template keyword in some places it shouldn't have. Take decltype34.C
> for example:
> 
> struct impl
> {
>    template <class T> static T create();
> };
> 
> template<class T, class U, class =
> decltype(impl::create<T>()->impl::create<U>())>
> struct tester{};
> 
> GCC currently accepts this code. My patch causes it to be rejected.

> For what it's worth, Clang also rejects this code:
> 
> 1824786093/source.cpp:6:70: error: use 'template' keyword to treat
> 'create' as a dependent template name
> template<class T, class U, class =
> decltype(impl::create<T>()->impl::create<U>())>
> 
>                                        ^ template
> 
> I think Clang is correct since from what I understand it should be
> written as impl::create<T>()->impl::template create<U>(). Here,
> create<T>() is dependent, so it makes sense that we would need
> "template" before the scope operator. Then again, I could well be
> wrong. The rules around dependent template names are pretty crazy.

This is basically core issue 1835, http://wg21.link/cwg1835

This was changed for C++23 by the paper "Declarations and where to find 
them", http://wg21.link/p1787

Previously, e.g. in C++20, 
https://timsong-cpp.github.io/cppwp/n4861/basic.lookup.classref#4 said 
that when we see ->impl, since we can't look up the name in the 
(dependent) object type, we do unqualified lookup and find ::impl, so 
create<U> is not in a dependent scope, and the testcase is fine.

The 1835 change clarifies that we only do unqualified lookup of the 
second impl if the object type is non-dependent, so the second create is 
in a dependent scope, and we need the ::template.

But in either case, whether create<U> is in a dependent scope depends on 
how we resolve impl::, we don't need to remember further back in the 
expression.  So your dependent_expression_p parameter seems like the 
wrong approach.  Note that when we're looking up the name after ->, the 
type of the object expression is in parser->context->object_type.

1835 also makes the following testcase valid, which we don't currently 
accept, but clang does:

template <class T> void f(T t) { t.foo::bar(); }
struct foo { void bar(); };
int main() { f(foo()); }

For C++20 and down, I want to start accepting this testcase, but also 
still accept decltype34.C to avoid breaking existing code.  But that's a 
separate issue; I don't think your patch should affect decltype34.

The cases you fixed in symbol-summary.h are indeed mistakes, but not 
ill-formed, so giving an error on them is wrong.  For example, here is a 
well-formed program that is rejected with your patch:

template <class T, int N> void f(T t) { t.m<N>(0); }
struct A { int m; } a;
int main() { f<A,42>(a); }

Comparing the result of < by > is very unlikely to be what the user 
actually meant, but it is potentially valid.  So we should accept f with 
a warning about the dubious expression.  People can silence the warning 
if they really mean this by parenthesizing the LHS of the < operator.

Another valid program, using ::

int x;
template <class T, int N> void f(T t) { t.m<N>::x; }
struct A { int m; } a;
int main() { f<A,42>(a); }

Now to reviewing the actual patch:

> +   yet).  Return 1 if it does look like a template-id.  Return 2
> +   if not.  */

Let's use -1 for definitely not.

> +  /* Could be a ~ referencing the destructor of a class template.
> +     Temporarily eat it up so it's not in our way.  */
> +  if (next_token->type == CPP_COMPL)
> +    {
> +      cp_lexer_save_tokens (parser->lexer);
> +      cp_lexer_consume_token (parser->lexer);
> +      next_token = cp_lexer_peek_token (parser->lexer);
> +      saved = true;
> +    }

There's no ambiguity in the case of a destructor, as you note in your 
comments in cp_parser_id_expression.  How about returning early if we see ~?

> +  /* Find offset of final ">".  */
> +  for (; depth > 0; i++)
> +    {
> +      switch (cp_lexer_peek_nth_token (parser->lexer, i)->type)
> +	 {
> +	   case CPP_LESS:
> +	     depth++;
> +	     break;
> +	   case CPP_GREATER:
> +	     depth--;
> +	     break;
> +	   case CPP_RSHIFT:
> +	     depth-=2;
> +	     break;
> +	   case CPP_EOF:
> +	   case CPP_SEMICOLON:
> +	     goto no;
> +	   default:
> +	     break;
> +	 }
> +    }

This code doesn't handle skipping matched ()/{}/[] in the 
template-argument-list.  You probably want to involve 
cp_parser_skip_to_end_of_template_parameter_list somehow.

> +no:
> +  if (saved)
> +    cp_lexer_rollback_tokens (parser->lexer);
> +  return 2;
> +
> +yes:
> +  if (saved)
> +    cp_lexer_rollback_tokens (parser->lexer);
> +  return 1;

Now that we're writing C++, I'd prefer to avoid this kind of pattern in 
favor of RAII, such as saved_token_sentinel.  If this is still relevant 
after addressing the above comments.

> +   If DEPENDENT_EXPRESSION_P is true, this is a dependent id-expression
> +   that is not the current instantiation.  */

As mentioned above, let's not add this parameter.

> +      error_at (next_token->location,
> +		"expected \"template\" keyword before dependent template "
> +		"name");

As mentioned above, this should be a warning.

> -	/* If it worked, we're done.  */
> -	if (cp_parser_parse_definitely (parser))
> -	  return id;
> +  if (looks_like_template != 2)
> +    {
> +      /* We don't know yet whether or not this will be a
> +	 template-id.  */

The indentation here is off.

> +			  int looks_like_template)

Let's give these parms a default argument of 0.  And call them 
looks_like_template_id to be clearer.

> -  /* Avoid performing name lookup if there is no possibility of
> -     finding a template-id.  */
> -  if ((token->type != CPP_NAME && token->keyword != RID_OPERATOR)
> -      || (token->type == CPP_NAME
> -	  && !cp_parser_nth_token_starts_template_argument_list_p
> -	       (parser, 2)))
> +  /* Only if we have not already checked whether this looks like a
> +     template.  */
> +  if (looks_like_template == 0)
>      {
> -      cp_parser_error (parser, "expected template-id");
> -      return error_mark_node;
> +      /* Avoid performing name lookup if there is no possibility of
> +	 finding a template-id.  */

You could add the looks_like_template_id check to the existing condition 
so we don't need to reindent the comment or body.

> +++ b/gcc/symbol-summary.h
> @@ -191,7 +191,7 @@ public:
>    template<typename Arg, bool (*f)(const T &, Arg)>
>    void traverse (Arg a) const
>    {
> -    m_map.traverse <f> (a);
> +    m_map.template traverse <f> (a);

I've gone ahead and applied this fix as a separate patch.

Jason