This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH v3 headers] variable uglification
- From: Paolo Carlini <paolo dot carlini at oracle dot com>
- To: Ralf Wildenhues <Ralf dot Wildenhues at gmx dot de>, gcc-patches at gcc dot gnu dot org, libstdc++ at gcc dot gnu dot org
- Date: Sun, 19 Sep 2010 11:10:10 +0200
- Subject: Re: [PATCH v3 headers] variable uglification
- References: <20100919082136.GE5435@gmx.de>
Hi Ralf,
> Hi Paolo, all,
>
> I noticed that C++ header uglification seems to be a slow manual
> process. That looks suboptimal, and shouldn't be the case, it
> should be mostly automatic and quick (in terms of developer time).
>
Well, I agree, but it seems to me that the issue normally is rather
academic because people working on the C++ runtime full time *know from
the outset* that uglification is required, *never* write first
un-uglified names and then fix each name in a second pass. Thus I see
efforts in this area more as a diagnostic, pre-commit tool for patches
contributed by accidental contributors, this kind of situation. In other
terms, no perfect accuracy required, etc. When you are happy with a
script (I see you clearly understand already all the quirks and special
cases) feel free to contribute / commit it right away. Thanks!
> My first idea was to just hunt the headers for suspicious words
> (avoiding the use of the preprocessor to catch all possible code):
>
> cd libstdc++-v3
> set x `find include \( -name \*.cc -o -name \*.am -o -name \*.in \) \
> -prune -o -type f -print`
> shift
> perl -e '
> my %words;
> undef $/; # slurp whole file at once, for multiline match
> while (<>) {
> s/\/\/[^\n]*//g; # good enough for C++ comments
> s/\/\*.*?\*\///gs; # good enough for C comments
> foreach my $word (split /\b/) {
> $words{$word}++
> if $word =~ m/^[a-zA-Z][a-zA-Z0-9_]*$/; # ignore _words
> }
> }
> foreach my $word (keys %words) {
> print "$words{$word}\t$word\n";
> }' "$@" |
> sort -k1n | less
>
> which already shows lots of potential issues below ext/, but also false
> positives from preprocessor statements, string literals, arithmetic
> constant prefixes and suffixes. Also, I still have to generate a list
> of keywords and C++ API words to exclude, but glancing over them is
> fairly easy.
>
> My next idea would be to instantiate as much code as possible and
> extract debugging symbols, maybe that can be an easy second attack
> vector.
>
> Another idea would be a testsuite addition poisoning one- and
> two-character identifiers not part of the API? Might be too dangerous
> in the presence of broken system headers.
>
Yes, I agree.
> Who designed C++ classes inside <random> with multiple one-character
> member names in the API by the way?
>
Physicists? ;) I'm rather serious actually, the specifications have been
largely worked out by physicists at Fermi Lab + Jens Maurer and somehow
nobody noticed so far. Bad, I agree that can be a problem. I'm afraid
it's a bit too late to change it but if you really think it can be a
*serious* problem, you can send a DR to the current LWG Chair, Alisdair
Meredith (wg21@alisdairm.net). I'll personally take care of following
its iter at the next Meetings.
> Anyway, I found a couple of instances with the above approach, patch
> below survived bootstrap and regtest on x86_64-unknown-linux-gnu. OK?
>
Sure.
Thanks again,
Paolo.