This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v3 headers] variable uglification

From: Paolo Carlini <paolo dot carlini at oracle dot com>
To: Ralf Wildenhues <Ralf dot Wildenhues at gmx dot de>, gcc-patches at gcc dot gnu dot org, libstdc++ at gcc dot gnu dot org
Date: Sun, 19 Sep 2010 11:10:10 +0200
Subject: Re: [PATCH v3 headers] variable uglification
References: <20100919082136.GE5435@gmx.de>

Hi Ralf,
> Hi Paolo, all,
>
> I noticed that C++ header uglification seems to be a slow manual
> process.  That looks suboptimal, and shouldn't be the case, it
> should be mostly automatic and quick (in terms of developer time).
>   
Well, I agree, but it seems to me that the issue normally is rather
academic because people working on the C++ runtime full time *know from
the outset* that uglification is required, *never* write first
un-uglified names and then fix each name in a second pass. Thus I see
efforts in this area more as a diagnostic, pre-commit tool for patches
contributed by accidental contributors, this kind of situation. In other
terms, no perfect accuracy required, etc. When you are happy with a
script (I see you clearly understand already all the quirks and special
cases) feel free to contribute / commit it right away. Thanks!
> My first idea was to just hunt the headers for suspicious words
> (avoiding the use of the preprocessor to catch all possible code):
>
> cd libstdc++-v3
> set x `find include \( -name \*.cc -o -name \*.am -o -name \*.in \) \
>             -prune -o -type f -print`
> shift
> perl -e '
>   my %words;
>   undef $/;             # slurp whole file at once, for multiline match
>   while (<>) {
>     s/\/\/[^\n]*//g;    # good enough for C++ comments
>     s/\/\*.*?\*\///gs;  # good enough for C comments
>     foreach my $word (split /\b/) {
>       $words{$word}++
>         if $word =~ m/^[a-zA-Z][a-zA-Z0-9_]*$/; # ignore _words
>     }
>   }
>   foreach my $word (keys %words) {
>     print "$words{$word}\t$word\n";
>   }' "$@" |
> sort -k1n | less
>
> which already shows lots of potential issues below ext/, but also false
> positives from preprocessor statements, string literals, arithmetic
> constant prefixes and suffixes.  Also, I still have to generate a list
> of keywords and C++ API words to exclude, but glancing over them is
> fairly easy.
>
> My next idea would be to instantiate as much code as possible and
> extract debugging symbols, maybe that can be an easy second attack
> vector.
>
> Another idea would be a testsuite addition poisoning one- and
> two-character identifiers not part of the API?  Might be too dangerous
> in the presence of broken system headers.
>   
Yes, I agree.
> Who designed C++ classes inside <random> with multiple one-character
> member names in the API by the way?
>   
Physicists? ;) I'm rather serious actually, the specifications have been
largely worked out by physicists at Fermi Lab + Jens Maurer and somehow
nobody noticed so far. Bad, I agree that can be a problem. I'm afraid
it's a bit too late to change it but if you really think it can be a
*serious* problem, you can send a DR to the current LWG Chair, Alisdair
Meredith (wg21@alisdairm.net). I'll personally take care of following
its iter at the next Meetings.
> Anyway, I found a couple of instances with the above approach, patch
> below survived bootstrap and regtest on x86_64-unknown-linux-gnu.  OK?
>   
Sure.

Thanks again,
Paolo.

Follow-Ups:
- Re: [PATCH v3 headers] variable uglification
  - From: Ralf Wildenhues

References:
- [PATCH v3 headers] variable uglification
  - From: Ralf Wildenhues

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]