This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [libcc1] add support for C++

From: Alexandre Oliva <aoliva at redhat dot com>
To: Jason Merrill <jason at redhat dot com>
Cc: gcc-patches List <gcc-patches at gcc dot gnu dot org>
Date: Mon, 23 Jan 2017 22:21:55 -0200
Subject: Re: [libcc1] add support for C++
Authentication-results: sourceware.org; auth=none
References: <orbmzem8mt.fsf@livre.home> <5e2deb89-1870-9615-266e-b7dcb1f43185@redhat.com> <oro9yy4rvm.fsf@lxoliva.fsfla.org> <CADzB+2mqJn_ZzMjQ6QqBgSBhae3Jf-8gnYd4r8jDKssAm-F4VQ@mail.gmail.com>
On Jan 23, 2017, Jason Merrill <jason@redhat.com> wrote:

> On Sun, Jan 22, 2017 at 6:06 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
>> On Jan 13, 2017, Jason Merrill <jason@redhat.com> wrote:
>> 
>>> On 09/23/2016 08:41 PM, Alexandre Oliva wrote:
>>>> +static tree global_friend_list;
>> 
>>> This should be a hash_set rather than a TREE_LIST.
>> 
>> You sure?  At least in the libcc1 use scenario, this is going to have a
>> single entry.  I didn't even have to make it a list, but I made it one
>> because, well, I could :-)  A hash_set seems excessive, unless you
>> envision other uses for it.

> Well, in general TREE_LISTs are discouraged nowadays in favor of
> hash_sets or VECs.  If it's a single entry, putting it there alone is
> fine too.

Ok, I made it a single pointer, assigned with set_global_friend(), and
dropped remove_from_global_friends_list.

>>>> +  // FIXME: we need a more space-efficient representation for
>>>> +  // oracle_looked_up.
>> 
>>> A hash_set would work for that, too.
>> 
>> I was hoping for an unused bit in IDENTIFIERs, but I guess that will do.
>> I'm a bit concerned about the performance impact of this in all name
>> lookups, though.  Just computing the hash for every lookup seems
>> excessive, compared with testing a bit.  Unless...  I guess making it a
>> hash_set*, NULL unless there is an Oracle, would alleviate this concern,
>> making it cost only when it's in use.

> Sounds good.

As it turns out, this is already guarded by the oracle pointer, so I
made it a static local hash_set.

>>>> +/* Exported wrapper for cp_literal_operator_id.  */
>>>> +
>>>> +tree
>>>> +ansi_litopname (const char *name)
>>>> +{
>>>> +  return cp_literal_operator_id (name);
>>>> +}
>> 
>>> Why not export cp_literal_operator_id itself?
>> 
>> My sense of aesthetics demanded a uniform set of exported names along
>> with ansi_opname and ansi_assopname ;-)

> Heh.  But I'd prefer to unify them in the other direction; use of
> "ansi" is obsolete.

Ok, I exported cp_literal_operator_id, and renamed macros ansi_opname to
cp_operator_id and ansi_assopname to cp_assignment_operator_id.

>>>> +   If the newly-created namespace is to be an inline namespace, after
>>>> +   push_namespace, get the nested namespace decl with
>>>> +   get_current_binding_level, pop back to the enclosing namespace,
>>>> +   call using_namespace with INLINE_P, and then push to the inline
>>>> +   namespace again.  */
>> 
>>> This seems like unnecessary contortions to expect of the caller
>>> (i.e. GDB).  Let's add a new front end function to handle this.
>> 
>> Heh, it's really nothing compared with some other contortions required
>> to navigate binding levels, e.g. to reenter a function scope.
>> 
>> I didn't want to add redundant functionality to the libcc1 API, and we'd
>> need the ability to separate the introduction or reentry in a namespace,
>> from a using directive with attribute strong.  And since inline
>> namespaces use the same implementation as that, I figured it made sense
>> to use the same interface for both.

>> Besides, push_namespace is called a lot more often than using_namespace,
>> so it makes sense to keep it simpler, and leave the extra parameter for
>> the less common operation.
>> 
>> Another argument to keep things as they are is that the inline-ness of a
>> namespace is not a property related with entering the namespace, but
>> rather with how names from it are brought into the other namespace.

> Well, it's a property of the namespace.

Even in the case of 'strong' using directives?  I didn't think so.  That
only requires the using namespace to be an ancestor of the used one, not
necessarily a direct ancestor, so the same namespace could be strongly
used by multiple ancestors, and non-strongly used by other namespaces,
even other ancestors.

> But I guess I won't insist on this.

I suppose introducing a new entry point to make the current namespace
inline would be an option, for convenience, even if we were to keep
strong directives handled separately in plugin_using_namespace.  Would
you prefer that?

>>>> +/* Pop the namespace last entered with push_namespace, or class last
>>>> +   entered with push_class, or function last entered with
>>>> +   push_function, restoring the binding level in effect before the
>>>> +   matching push_* call.  */
>>>> +
>>>> +GCC_METHOD0 (int /* bool */, pop_namespace)
>> 
>>> This should have a different name, perhaps pop_last_entered_scope?
>> 
>> It was introduced very early on, long before the need for exposing
>> function scopes was realized and implemented.  GDB (branch) already had
>> plenty of code using this primitive, so I didn't want to rename it, but
>> if you insist in such spelling changes, I won't mind, and I guess GDB
>> folks won't even.  It's not too late yet ;-)

> Please.

How about just pop_binding_level, to align it with the nomenclature used
in the comments?  (I'd have suggested just pop_scope otherwise)


>>>> +/* Return the NAMESPACE_DECL, TYPE_DECL or FUNCTION_DECL of the
>>>> +   binding level that would be popped by pop_namespace.  */
>>>> +
>>>> +GCC_METHOD0 (gcc_decl, get_current_binding_level)
>> 
>>> Perhaps get_current_binding_level_decl?
>> 
>> Ditto, except I'm not sure GDB already uses this one, so it's probably a
>> much easier sell.

> Please.

Ok.  FYI, I talked to GDB folks, and they didn't oppose spelling changes
at all, so we can go wild cleaning it all up ;-)


>>>> +   The TARGET decl names the qualifying scope (foo:: above) and the
>>>> +   identifier (bar), but that does not mean that only TARGET will be
>>>> +   brought into the current scope: all bindings of TARGET's identifier
>>>> +   in the qualifying scope will be brought in.
>> 
>>> This seems wrong; for namespace-scope using-declarations, only the
>>> declarations in scope at the point of the using-declaration are
>>> imported.  Since DWARF represents each imported declaration
>>> individually, I would think that we would want the plugin to import
>>> them one at a time.
>> 
>> What you say is true, but GCC doesn't offer functionality for that
>> AFAICT, and the caller can always arrange for only the bindings that
>> should be brought in by the using declarations to be defined at the time
>> the using declaration is introduced, so I left it at that so as to
>> minimize the changes to GCC proper.

> Fair enough, but then why pass a decl?  Passing scope and identifier
> would seem cleaner if that's what you mean.

Oh, hysterical raisins IIRC.  'scope' was not so easy to pass.  The API
makes gcc_decl and gcc_type distinct, unrelated types and, although
clients would get gcc_decls for functions and (when needed) namespaces,
they'd only get gcc_types for classes.  Later on, start_class_definition
was split so as to make forward declarations possible, and clients now
necessarily get a gcc_decl for a class.  (IIRC there weren't
get_current_binding_level or type_decl either at that point)

Now that the decl for a class is easily available, it makes some sense
ot change the API to take a decl to specify the scope, but then a string
might not be enough to name a member: operators are kind of a separate
namespace, and we'd have to add a flag or some other means to denote
them.  It can be done, but it feels like a lot of pain for the debatable
benefit of introducing additional possibilities of erroneous uses, such
as passing invalid contexts and whatnot (besides the minor inconvenience
of having to convert classes from decls to types, as DECL_CONTEXT
wants).

OTOH, passing a context explicitly, even if optional, and using a decl
to convey name (sidestepping the operator name encoding issues) and
context (if not explicitly stated) might enable the correct handling of
some cases in which the client might have to issue multiple calls to
accomplish a single using declaration that (indirectly) brings in names
from multiple scopes.  Now, I don't see how the client would get that
information.  Consider:

namespace A {
  int f(int);
}

namespace B {
  void f(void *);
}

namespace C {
  using namespace A;
  using namespace B;
}

namespace D {
  using C::f;
}

There isn't any declaration in namespace C that the libcc1 client could
use to convey the notion that we're bringing in all occurrences of C::f,
in a way that would enable the client to use the current API for this
purpose.  However, the information that the client gets from debug
information is that namespace D uses A::f and B::f, without any
reference to C, so in the end, taking A::f and B::f won't just get the
expected results: it offers the most convenient interface for the
client, considering that what they get from debug info is a reference to
the DIE of the imported declaration.


>>>> For operator functions, set GCC_CP_FLAG_SPECIAL_FUNCTION in
>>>> +   SYM_KIND, in addition to any other applicable flags, and pass as
>>>> +   NAME a string starting with the two-character mangling for operator
>>>> +   name
>> 
>>> Perhaps the compiler side should handle this transparently?
>> 
>> Sorry, I don't understand what you're suggesting.  Can you please
>> elaborate?

> I mean have the compiler recognize that the name is magic and infer
> that it's a special function.

I'd rather not have accidental namespace collisions: there are several
layers with different conventions at play, so I decided to mark such
things explicitly so as to be able to catch misuses.  Consider that we
have GCC's internal identifier representation, mangled identifier
representation, names exported in debug information, and GDB's internal
representation.  When we get a special name, we want to be sure there
was not a failure to convert between one's internal representation and
libcc1's conventions.  That's what motivated me to mark them explicitly.

Adopting the name mangling conventions, rather than some enumeration or
so, is justified by the need for some arbitrary convention, and reusing
an existing one (that is NOT present in debug information names) avoided
reinventing the wheel while achieving the goal of reducing the risk of
leaking unconverted names.

Exposing GCC's internal conventions, or the DWARF naming conventions,
might have worked as arbitrary conventions, but they might not have
served as well the goal of stopping accidental leakage of unconverted
internal representation names to API calls.


>> The clone is The clone is specified by NAME, using the following name
>> mangling conventions

> Without the duplicate "The clone is", I hope? :)

Heh, yeah.  How could this happen in the email?


>>>> +GCC_METHOD5 (gcc_type, new_template_typename_parm,
>> 
>>> s/typename/type/
>> 
>> The purpose behind this choice was to avoid the unwanted "template type"
>> grouping.  I figured "typename" would convey the correct concept, even
>> before the "new_template_*_parm" pattern could be formed.  Hmm...  I
>> wonder if I should have made it new_*_template_parm instead...

> Sure, that would be even better.

Ok, I had a bit of trouble adjusting new_template_template_parm to fit
the new pattern ;-) but it's done.


>>>> +GCC_METHOD2 (gcc_type, new_dependent_typespec,
>> 
>>> "typespec" usually means type-specifier.
>> 
>> Oops ;-)
>> 
>>> How about new_dependent_type_template_id?
>> 
>> Or new_dependent[_template]_specialization?  I think I'd rather use the
>> term 'specialization' because I've used it in other primitives and in
>> the documentation, whereas it would be the only use of 'id'.

> Well, a template-id is the syntactic term for combining a template
> name and some template arguments, which seems like what this function
> is doing.  A template-id names a specialization, but isn't quite the
> same.

Sold ;-) new_dependent_type_template_id it is (at least until I get to
removing the build/new inconsistencies).

>>>> +GCC_METHOD5 (gcc_expr, new_dependent_value_expr,
>> 
>>> I don't think you need "value" here.  Nor in the other *_value_expr names.
>> 
>> That was to distinguish it from type expressions.  I was concerned it
>> could be misused to construct

> I don't know what you mean by "type expressions".

Interesting.  I could have sworn on the GNU Manifesto :-) this was a
term from the C++ standard, to refer to arbitrary constructs that denote
types, including those involving decltype, but...  someone must have
taken it away when I wasn't looking ;-)

So I guess we can do away with it, hoping those interacting with the
libcc1 client won't make the same mistake ;-)

>>>> +/* Build a gcc_expr that denotes the conversion of an expression list
>>>> +   VALUES to TYPE, with ("tl") or without ("cv") braces, or a braced
>>>> +   initializer list of unspecified type (e.g., a component of another
>>>> +   braced initializer list; pass "il" for CONV_OP, and NULL for
>>>> +   TYPE).  */
>> 
>>>> +GCC_METHOD3 (gcc_expr, values_expr,
>> 
>>> Hmm, it seems like you're conflating two things here: the expressions,
>>> and the conversions.  I'd suggest functional_cast_expr, but handling a
>>> plain braced-init-list through the same interface is surprising.
>> 
>> The reason they ended up in the same spot is that I grouped the various
>> kinds of expressions according to the meta-types of the operands, so all
>> of the expressions that took a list of expressions (values), with or
>> without a type, ended up in the same API entry point.

> Hmm, OK.  Maybe expression_list_expr?

I like it.

>>>> +GCC_METHOD4 (gcc_expr, alloc_expr,
>> 
>>> new_expr?
>> 
>> new_ would be confusing, for it was used as a prefix for entry points
>> that introduce new classes, new fields, etc.
>> 
>>> This leads me to notice that some of the entry points start with
>>> "new_", some start with "build_", and some have no prefix.  Is this
>>> intentional?
>> 
>> The 'build_' ones were inherited from the libcc1 API for C.  The major
>> API departure from C's build_decl made me rename it to new_decl, and
>> then I followed the new_ pattern when introducing other names.  IOW, no
>> strong reasons, just ones that offer insights into the development
>> history of the API.  I guess we could change them all to build_, and use
>> new_ for alloc_expr, but I'm hesitant, because of the inconvenience this
>> might cause to the GDB folks that may have already got used to this
>> admittedly inconsistent convention.

> Better to fix an inconsistent API now than leave it to confuse future
> developers, I would think.

*nod*.  On it.

Thanks!

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Follow-Ups:
- Re: [libcc1] add support for C++
  - From: Jason Merrill
References:
- Re: [libcc1] add support for C++
  - From: Jason Merrill
- Re: [libcc1] add support for C++
  - From: Alexandre Oliva
- Re: [libcc1] add support for C++
  - From: Jason Merrill
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]