[PATCH] Support the new ("v0") mangling scheme in rust-demangle.

Eduard-Mihai Burtescu eddyb@lyken.rs
Sun Nov 1 11:57:37 GMT 2020


Reading the diff patch, the v0 changes look great. I wouldn't be too worried
about the "printable character" aspect, there are similar Unicode-related
issues elsewhere, e.g. the "non-control ASCII" comment in decode_legacy_escape
(I suppose we could make it also go through the "print a non-control ASCII
character or some escape sequence" logic you added, if you think that helps).

However, I'm not sure about the legacy changes. Or rather, the .llvm. one, it's
not really Rust-specific, it's only in the rustc-demangle crate for convenience,
but C++ code compiled with Clang could run into the same problem - ideally,
stripping that suffix could be done uniformly in cplus-dem.c, but I decided
against making that change myself, for now.

I'm especially not comfortable removing the fast path, since that was the
condition under which I was able to make Rust demangling be attempted first,
before C++, in order to implement the Rust legacy demangling standalone,
separately from C++ demangling, so that it could be together with the v0 one.

It should be possible to keep the fast path if stripping .llvm.* suffixes is
done before either Rust or C++ demangling is attempted, but even if that would
be nice to have, IMO it should be a separate patch and not block v0 demangling.

As for the dataset, it doesn't include .llvm. because it's collected by rustc
itself, before LLVM had any chance to add any suffixes. This was done in order
to have several different mangling formats dumped at once for every symbol.
(It also contains symbols for e.g. functions that LLVM would've inlined away)

I can test the patch and upload the dataset tomorrow, but if you want to get
something committed sooner (is there a deadline for the next release?), feel
free to land the v0 changes (snprintf + const values) without the legacy ones.

Thanks,
- Eddy B.

On Sun, Nov 1, 2020, at 11:18, Nikhil Benesch wrote:
> 
> 
> On 10/29/20 12:16 AM, Nikhil Benesch wrote:
> > On Wed, Oct 28, 2020 at 7:53 PM Eduard-Mihai Burtescu <eddyb@lyken.rs> wrote:
> >> I agree that landing the extra changes later on top should be fine anyway,
> >> though when I make the sprintf -> snprintf change, I could provide the extra
> >> changes as well (either as a combined patch, or as a second tiny patch).
> >>
> >> Sadly I don't think I can get to either until next week, hope that's okay.
> > 
> > I can make the sprintf -> snprintf change as early as tomorrow.
> > 
> > I suspect I can also make the substantive const generics change, based
> > on the Rust implementation, though that's less of a promise.
> 
> Attached is an updated patch with both of the aforementioned changes. 
> The demangling of constants matches the demangling in rustc-demangle 
> library as best as I could manage. The strategy of demangling constant 
> chars via `format!("{:?}", char)` in Rust makes life hard for us in C, 
> because there is no easy way to replicate Rust's debug formatting for 
> chars. (Unless GCC has a Unicode library handy somewhere.)
> 
> In the course of this work I noticed several inconsistencies in how 
> rustc-demangle and libiberty were demangling some legacy Rust symbols. I 
> fixed those and added test cases, though the fixes required removing 
> some of the fast checks for whether a given symbol is a valid Rust symbol.
> 
> For ease of review, eddyb, I also attached the diff from your last diff 
> to mine. Since I don't have your collection of symbols I generated my 
> own by running nm on the materialized binary from
> https://github.com/MaterializeInc/materialize.
> 
> Let me know how I can help. I'm happy to address review feedback myself, 
> or I'm happy to hand things back over to you, eddyb.
> 
> Nikhil
> 
> Attachments:
> * rust-demangle-diff.patch
> * rust-demangle.patch


More information about the Gcc-patches mailing list