This is the mail archive of the libstdc++@sources.redhat.com mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: character encoding (locale)


> 
> I suggest "Standard C++ IOStreams and Locales" by Angelika Langer and 
> Klaus Kreft, for more info. This is a great book.
> 
> > I watched the sources and your documentation quite a while. As I still
> have
> > no copy of the standard (don't ask) there is one question I have.
> > Can a program in any way determined by the standard get the character
> > encoding of a file in a just opened stream. I don't mean the locales
> in ios or
> > the connected streambuf. I mean how do you get the encoding of a plain
> text
> > file.
> 
> I suspect you mean "deduce" the encoding. And the answer is that we can 
> use the locale set by the "C" library as the C++ default. So, if you set
> 
> the C locale correctly, the C++ locale should default to being the 
> correct thing.
> 
> You can always set the locale explictly, before you read in a file, with
> 
> the imbue member function.

No, no, that was clear to me. My problem is, I try to write a DOM document
builder(parser) as an overloaded rightshift operator (for documentation
purposes, as I said on http://gcc.gnu.org/ml/libstdc++/2000-12/msg00005.html),
but an XML-file can have any encoding. So the problem is to determine, if
the file you just opened has utf8 or utf16 or any other encoding(as you can
get the file from anywhere) and then set the right locale by imbue.
But as you didn't even think of that possibility, I don't think it is part
of the standard and I have to parse and guess, or hope the encoding is
given in the file, or try several locales.

> If you have to deal with specific encodings, you might want to look 
> at the documention under 22_locale for codecvt. If you use 
> __enc_traits specializations you can specify specific encodings.
> 
> -benjamin

It's good to know how to construct your own encoding, but there is quite a
big set of encodings in glibc2.2, and I don't know old Maya, which I think
is the only one missing.

Thank you for help. I will get the literature you suggested and a copy of
the standard (I don't have a credit card to pay, so ask friends). I will
continiue coding, and soon first bug reports for wstring will arrive here.

Jan

-- 
Sent through GMX FreeMail - http://www.gmx.net

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]