__enc_traits and Unicode strings and related stuff

Fri Feb 16 13:56:00 GMT 2001

> -----Original Message-----
> From: Benjamin Kosnik [ mailto:bkoz@redhat.com ]
> Sent: Friday, February 16, 2001 12:49 PM

> > I've been instantiating all of the _CharT related stuff 
> using uint16_t
> > as the character type, using class of my own design for the _stateT
> > type, and custom specializations for char_traits and codecvt stuff.
> 
> ...then you must know this is an issue that's pretty near and dear to
> me. Can you elaborate here? Did you see the __enc_traits stuff? It's
> specifically designed to work with this encoding, plus others, using
> the approach you are using. I'm assuming you're working on unicode
> string... I would love to know your approach. If you post details,
> just start another thread with a new subject...

Good guess about the Unicode.

And yes, I've studied your __enc_traits stuff pretty closely. I like your
approach, but because your implementation is specific to the GNU libstdc++,
and because my employer is in the business of building cross-platform
products (not to mention a stdlib), I needed to recreate a version that
would sit on top of any compliant stdlib implementation.

One of the main differences between your solution and mine is that I've been
building my stuff on top of the IBM ICU library instead of iconv. I too use
my own class as the _stateT argument, and instances of this class are used
to specify what type of conversion is to be performed.

Many of my classes are actually specializations of existing stdlib templates
that I place in the std namespace. This is explicitly allowed by the
standard so as long as your full or partial specialization references types
that are defined outside of the std namespace (that way your specialization
can't conflict with ones allowed by the base library).

The biggest problem that I ran into is that the design specified by the
standard assumes that the type of conversion (and therefore the source and
destination encoding) can be controlled by the types used as the template
arguments. While codecvt_byname can be used to identify a specific external
encoding (using the [lang[_terr[.encoding]]] locale id), that information is
only known within the codecvt instance. This presents a problem if the
actual conversion is controlled by the state object passed in via the in(),
out(), and length() methods (as is the case in my implementation).

To work around this problem, I had to make my codecvt specialization act as
an implicit state object factory. Whenever a state object is passed in, the
codecvt instance will check to make sure that the state object is
representative of the type of conversion that is to be performed.

Anyway, that's pretty much the gist of what I've been doing...

-g.b.