[PATCH] add -Wmismatched-tags (PR 61339)

Jason Merrill jason@redhat.com
Wed Dec 4 23:37:00 GMT 2019


On 12/3/19 4:49 PM, Martin Sebor wrote:
> On 8/5/19 4:30 PM, Jason Merrill wrote:
>> On Mon, Aug 5, 2019 at 5:50 PM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> On 8/5/19 1:25 PM, Jason Merrill wrote:
>>>> On 8/1/19 7:35 PM, Martin Sebor wrote:
>>>>> On 8/1/19 12:09 PM, Jason Merrill wrote:
>>>>>> On 7/22/19 12:34 PM, Martin Sebor wrote:
>>>>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00622.html
> ...
>>>>>> -Wmismatched-tags is useful to have, given the MSVC/clang situation,
>>>>>> but I wonder about memory consumption from all the record keeping.
>>>>>> Do you have any overhead measurements?
>>>>>
>>>>> I did think about the overhead but not knowing if the patch would
>>>>> even be considered I didn't spend time optimizing it.
>>>>>
>>>>> In an instrumented build of GCC with the patch I just did I have
>>>>> collected the following stats for the number of elements in
>>>>> the rec2loc hash table (for 998 files):
>>>>>
>>>>>     rec2loc elements   locvec elements
>>>>>     min:           0                 0
>>>>>     max:       2,785             3,303
>>>>>     mean:        537             1,043
>>>>>     median:      526             1,080
>>>>>
>>>>> The locvec data are based on totals for the whole hash table, so
>>>>> in the worst case the hash_map has 2,785 elements, and the sum of
>>>>> all locvec lengths in all those elements is 3,303.  Each rec2loc
>>>>> element takes up 16 bytes, plus the size of the locvec data.
>>>>>
>>>>> If my math is right (which doesn't happen too often) in the worst
>>>>> case the bookkeeping overhead is 43KB for the hash_map plus on
>>>>> the order of 140KB for the vectors (just doubling the number of
>>>>> elements to account for capacity = 2 X size, times 24 for
>>>>> the flag_func_loc_t element type).  So 183K in the worst case
>>>>> in GCC.
>>>>
>>>> 183k cumulative over all the GCC sources doesn't sound excessive, 
>>>> but...
>>>>
>>>>> There are a few ways to reduce that if it seems excessive.
>>>>>
>>>>> One is by avoiding some waste in flag_func_loc_t which is
>>>>>
>>>>>     pair<tag_types, pair<bool, pair<tree, location_t>>>
>>>>>
>>>>> tree could come first and tag_types and the bool could share
>>>>> space.  That should bring it down to 16 in LP64, for about
>>>>> 30% off the 183K.
>>>>>
>>>>> Beyond that, we could change the algorithm to discard records
>>>>> for prior declarations after the first definition has been seen
>>>>> (and any mismatches diagnosed).
>>>>>
>>>>> We could also only collect one record for each definition
>>>>> in system headers rather than one for every declaration and
>>>>> reference.
>>>>
>>>> ...these all sound worthwhile.
>>>
>>> Okay, I'll look into it.
>>>
>>> To be clear: it was 183K maximum for a single compilation, not
>>> aggregate for all of them.
>>
>> Aha.  Thanks.
> 
> Attached is a revised patch that implements just -Wmismatched-tags
> without the other two warnings (-Wclass-not-pod and -Wstruct-is-pod).
> It also implements the optimizations mentioned above.  To make it
> easier to do the tag cleanup by simply dropping the class-key when
> it isn't necessary (suggested during the review of the cleanup patch)
> I added another warning: -Wredundant-tags to point out instances
> where the class-key or enum-key can safely be dropped. Both warnings
> are off by default.
> 
> With the optimizations in place, the biggest space overhead of
> using the option I measured in a GCC build was 990 elements of
> the record_to_locs_t hash table, plus 2756 elements of the locvec
> vector.  In LP64, each record_to_locs_t element type is 16 bytes
> and each element of the locvec vector is 24 bytes, so the maximum
> space overhead is on the order of 80K.  The average overhead per
> GCC translation unit was about 30K.
> 
> The patch depends on fixes for a few bugs in GCC hash_tables (PR
> 92761 and 92762).  I will post those separately.
> 
> Martin
> 
> PS Independently of this patch I will propose updating the GCC
> Coding Conventions to remove the guideline to use the struct
> class-key for PODs and class for non-PODs:
>    https://gcc.gnu.org/codingconventions.html#Class_Use

> +class rec_decl_loc_t

Let's use "class" instead of "record" generally in this patch.

> +/* A mapping between a TYPE_DECL for a class and the rec_decl_loc_t
> +   description above.  */
> +typedef hash_map<tree, rec_decl_loc_t> record_to_locs_t;

You might want to use hash_map<tree_decl_hash, rec_decl_loc_t> so you 
get the decl-specific hash function rather than the generic pointer hash.

It's hard to distinguish between this type and the previous one by name; 
this one should probably have "map" in its name.

> +static GTY (()) record_to_locs_t *rec2loc;
...
> +    rec2loc = new record_to_locs_t ();

If this isn't GC-allocated, marking it with GTY(()) seems wrong.  How do 
you imagine this warning interacting with PCH?

> +  /* A prior declaration of TYPE_DECL has been seen.  */

The rest of this function from this point on seems like it would be 
better as a member function of rec_decl_loc_t.

> +static void
> +diag_mismatched_tags (tree type_decl, rec_decl_loc_t &recloc)

This could also be a member function.

> +  if (rdl->idxdef < UINT_MAX && rdl->def_class_key == class_key)

Maybe != rather than < ?

> +@item -Wmismatched-tags @r{(C++ and Objective-C++ only)}

The documentation of this flag should mention that the main reason 
someone might want to use it is for compatibility with MSVC++, where 
struct and class are improperly mangled differently.

Jason



More information about the Gcc-patches mailing list