This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: lazy facet instantiation


Hi Howard,

> I've just glanced around our codecvt code, and obviously Benjamin is 
> better equipped to answer this question.  I'm not familiar enough 
> with iconv use (for example) to say.

Yes. Note that, at the moment, we are not using iconv at all for the GNU
linux locale model (the most powerful we offer). In fact, it is an open
project extending and updating the so-called "ieee_1003.1-2001", which
is meant to use it. That would also easily allow for *non-trivial*
mappings char <-> char, which currently we don't provide at all (this
topic too, discussed a bit already with Martin). If you believe this
topic is interesting, we can file one more enhancement PR, in order not
to forget...

> However I have in the past written several codecvt's that assume a 
> char for the external type, and are templated on the internal type.  
> E.g.:
>
> template <class _InternT>  // _InternT can be either a 16 bit or 32 
> bit scalar
> class __utf8
>     :  public ... // ultimately from std::codecvt<_InternT, char, 
> std::mbstate_t>
> {
> public:
>      typedef _InternT intern_type;
>      typedef char     extern_type;
>      ...
> };
>
> For __utf8<16_bit_scalar> it is implicitly assumed that the upper 16 
> bits are always zero, and thus do_max_length() returns 3 (for 
> example).  For __utf8<32_bit_scalar> it would return 6.  For any 
> other instantiation it could fail at compile time.
>
> For simplistic codecvt's that simply copy all bytes, or lop off the 
> high bytes (for example), the logic for handling different sized 
> internal scalars is fairly simple.

Indeed, this is all pretty much straightforward. The only issue is that
of portability of the external format, which ends up depending on
endianity, for instance. I don't think the current locale model of the
standard has got this kind of philosophy about the external
representation. But if users want that, we can provide it with a very
limited amount of work, indeed.

> I originally got started in this area by having to support two sizes 
> of wchar_t:  16 bit and 32 bit.  And it just generalized from there.  
> I guess wchar_t for gcc is always 32 bits?

Well, not really, it depends on -fshort-wchar. It is true, however, that
*on GNU systems* wchar_t is always 32 bits (see the glibc docs, for
example (*)) and we assume that for the GNU locale model.

Paolo.

(*) AFAIK, most of glibc doesn't work at all together with -fshort-wchar...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]