This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH 1/3] add -Wstruct-not-pod, -Wclass-is-pod, -Wmismatched-tags (PR 61339)
On Mon, Aug 5, 2019 at 5:50 PM Martin Sebor <email@example.com> wrote:
> On 8/5/19 1:25 PM, Jason Merrill wrote:
> > On 8/1/19 7:35 PM, Martin Sebor wrote:
> >> On 8/1/19 12:09 PM, Jason Merrill wrote:
> >>> On 7/22/19 12:34 PM, Martin Sebor wrote:
> >>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00622.html
> >>>> On 7/8/19 3:58 PM, Martin Sebor wrote:
> >>>>> The attached patch implements three new warnings:
> >>>>> * -Wstruct-not-pod triggers for struct definitions that are not
> >>>>> POD structs,
> >>>>> * -Wclass-is-pod triggers for class definitions that satisfy
> >>>>> the requirements on POD structs, and
> >>>>> * -Wmismatched-tags that triggers for class and struct declarations
> >>>>> with class-key that doesn't match either their definition or
> >>>>> the first declaration (if no definition is provided).
> >>>>> The implementation of -Wclass-is-pod and -Wstruct-not-pod is fairly
> >>>>> straightforward but the -Wmismatched-tags solution is slightly
> >>>>> unusual.
> >>>>> It collects struct and class declarations first and diagnoses
> >>>>> mismatches
> >>>>> only after the whole tramslation unit has been processed. This is so
> >>>>> that the definition of a class can guide which declarations to
> >>>>> diagnose
> >>>>> no matter which come first.
> >>> As Jonathan and I were saying in the connected discussion, the *pod
> >>> warnings enforce questionable coding guidelines, so I'd rather not
> >>> have them in the compiler.
> >> What specifically do you consider questionable?
> >> When defining a new class one has to decide whether to use 'struct'
> >> and when to use 'class'. Since there's no difference, adopting any
> >> convention to help make a consistent choice seems like a natural
> >> thing to do. You could say (as I think someone in this thread did):
> >> "make it 'class' when it has a ctor or dtor." Someone else might
> >> add: "or assignment." Someone else still might go even further
> >> and include "private members or bases or virtual functions." Since
> >> all the type definitions end up equivalent regardless of the class-key,
> >> all these choices seem equally reasonable. But all of them are also
> >> arbitrary. The only difference is that by being based on a common
> >> (and at least historically, widely relied on) property, the POD
> >> convention is less arbitrary. That's probably why it's apparently
> >> quite popular (as evident from the Stack Exchange thread).
> > Yes, all of them are somewhat arbitrary. This one seems like a poor
> > choice, for the reasons Jon and I gave in the other thread. IMO
> > 'aggregate' would be a better property to use.
> > Which Stack Exchange thread do you mean?
> The one I mentioned here:
> i.e. the most popular answer to the question
> When should you use a class vs a struct in C++?
> I would recommend using structs as plain-old-data structures
> without any class-like features, and using classes as aggregate
> data structures with private data and member functions.
> My goal with the warning is to provide a way to help enforce
> the convention we know is in use, both in GCC (at least until
> it's removed) and in other projects. I'm not interested in
> inventing something new even if it's "better" (in some sense)
> than the convention we know is in use.
We don't know that this is in use as a strict division, anywhere. All
the answers at that URL are using the term pretty loosely. The top
answer says, "I would recommend using structs as plain-old-data
structures without any class-like features, and using classes as
aggregate data structures with private data and member functions."
A struct containing a std::string is not using any class-like
features, but is not a POD. I oppose a coding convention that
requires the user to change that to a class with public: at the top.
And so I oppose introducing warnings to enforce such a convention.
> That said, as the C++ standard itself says, PODs (or in recent
> revisions, trivial standard-layout types) are most commonly
> associated with interoperability with C (and other languages).
> Aggregates can have data members of types that don't exist in
> C (references, pointers to members). So a convention that uses
> the keyword 'struct' to reflect the common interoperability
> aspect or that wants to convey that a C++ struct looks and feels
> like one would expect of a C struct, will want to be based on
> the concept of POD, not that of an aggregate.
Note that the mention of pointers to members was removed from the POD
definition in C++03. And then the term itself was deemed not a useful
To me, the important C-like look and feel of a 'struct' is that it is
intended for users to access the data directly, rather than through
member functions. A struct with a string qualifies. It is possible
for a class to be intended for use that way but still not qualify as
an aggregate, e.g. because it has a constructor only to allow
parenthesized initialization (as I've seen in GCC), but aggregate is a
better approximation than POD for this distinction.
> >>> -Wmismatched-tags is useful to have, given the MSVC/clang situation,
> >>> but I wonder about memory consumption from all the record keeping.
> >>> Do you have any overhead measurements?
> >> I did think about the overhead but not knowing if the patch would
> >> even be considered I didn't spend time optimizing it.
> >> In an instrumented build of GCC with the patch I just did I have
> >> collected the following stats for the number of elements in
> >> the rec2loc hash table (for 998 files):
> >> rec2loc elements locvec elements
> >> min: 0 0
> >> max: 2,785 3,303
> >> mean: 537 1,043
> >> median: 526 1,080
> >> The locvec data are based on totals for the whole hash table, so
> >> in the worst case the hash_map has 2,785 elements, and the sum of
> >> all locvec lengths in all those elements is 3,303. Each rec2loc
> >> element takes up 16 bytes, plus the size of the locvec data.
> >> If my math is right (which doesn't happen too often) in the worst
> >> case the bookkeeping overhead is 43KB for the hash_map plus on
> >> the order of 140KB for the vectors (just doubling the number of
> >> elements to account for capacity = 2 X size, times 24 for
> >> the flag_func_loc_t element type). So 183K in the worst case
> >> in GCC.
> > 183k cumulative over all the GCC sources doesn't sound excessive, but...
> >> There are a few ways to reduce that if it seems excessive.
> >> One is by avoiding some waste in flag_func_loc_t which is
> >> pair<tag_types, pair<bool, pair<tree, location_t>>>
> >> tree could come first and tag_types and the bool could share
> >> space. That should bring it down to 16 in LP64, for about
> >> 30% off the 183K.
> >> Beyond that, we could change the algorithm to discard records
> >> for prior declarations after the first definition has been seen
> >> (and any mismatches diagnosed).
> >> We could also only collect one record for each definition
> >> in system headers rather than one for every declaration and
> >> reference.
> > ...these all sound worthwhile.
> Okay, I'll look into it.
> To be clear: it was 183K maximum for a single compilation, not
> aggregate for all of them.