This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3 headers] variable uglification

Hi Ralf,
> Hi Paolo, all,
> I noticed that C++ header uglification seems to be a slow manual
> process.  That looks suboptimal, and shouldn't be the case, it
> should be mostly automatic and quick (in terms of developer time).
Well, I agree, but it seems to me that the issue normally is rather
academic because people working on the C++ runtime full time *know from
the outset* that uglification is required, *never* write first
un-uglified names and then fix each name in a second pass. Thus I see
efforts in this area more as a diagnostic, pre-commit tool for patches
contributed by accidental contributors, this kind of situation. In other
terms, no perfect accuracy required, etc. When you are happy with a
script (I see you clearly understand already all the quirks and special
cases) feel free to contribute / commit it right away. Thanks!
> My first idea was to just hunt the headers for suspicious words
> (avoiding the use of the preprocessor to catch all possible code):
> cd libstdc++-v3
> set x `find include \( -name \*.cc -o -name \*.am -o -name \*.in \) \
>             -prune -o -type f -print`
> shift
> perl -e '
>   my %words;
>   undef $/;             # slurp whole file at once, for multiline match
>   while (<>) {
>     s/\/\/[^\n]*//g;    # good enough for C++ comments
>     s/\/\*.*?\*\///gs;  # good enough for C comments
>     foreach my $word (split /\b/) {
>       $words{$word}++
>         if $word =~ m/^[a-zA-Z][a-zA-Z0-9_]*$/; # ignore _words
>     }
>   }
>   foreach my $word (keys %words) {
>     print "$words{$word}\t$word\n";
>   }' "$@" |
> sort -k1n | less
> which already shows lots of potential issues below ext/, but also false
> positives from preprocessor statements, string literals, arithmetic
> constant prefixes and suffixes.  Also, I still have to generate a list
> of keywords and C++ API words to exclude, but glancing over them is
> fairly easy.
> My next idea would be to instantiate as much code as possible and
> extract debugging symbols, maybe that can be an easy second attack
> vector.
> Another idea would be a testsuite addition poisoning one- and
> two-character identifiers not part of the API?  Might be too dangerous
> in the presence of broken system headers.
Yes, I agree.
> Who designed C++ classes inside <random> with multiple one-character
> member names in the API by the way?
Physicists? ;) I'm rather serious actually, the specifications have been
largely worked out by physicists at Fermi Lab + Jens Maurer and somehow
nobody noticed so far. Bad, I agree that can be a problem. I'm afraid
it's a bit too late to change it but if you really think it can be a
*serious* problem, you can send a DR to the current LWG Chair, Alisdair
Meredith ( I'll personally take care of following
its iter at the next Meetings.
> Anyway, I found a couple of instances with the above approach, patch
> below survived bootstrap and regtest on x86_64-unknown-linux-gnu.  OK?

Thanks again,

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]