This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: character encoding (locale)

> On Tue, 12 Dec 2000, Jan Schukat wrote:
> >
> > Can a program in any way determined by the standard get the character
> > encoding of a file in a just opened stream. I don't mean the locales
> in ios or
> > the connected streambuf. I mean how do you get the encoding of a plain
> text
> > file.
> In Unix there is no such thing as a plain text file (all files are
> simply seen as a consecutive stream of bytes). 
> Oh, sure, programs can interpret the data in a file as text, and
> conventions have developed that the LF character
> (0x0a) delimits end-of-line in such a file when representing text using
> a multibyte character encoding, but that's
> not supported by the OS or its philosophy. The C++ standard reflects
> this worldview, and explicitly supports only
> byte-oriented operations on a file.  While the IOStreams library
> supports wide character representation in stream
> buffers (such as wfstream), the underlying transport is byte-oriented.
> The standard address character data representation (multibyte (char) vs.
> wide characters (wchar_t)) and file transport
> (bytes) but the interpretation of data (character encoding) is not a
> part of the standard.  To that end, the C++
> standard library, and as far as I can tell the g++-v3, offers no
> built-in support for particular interpretations of
> data from files.  The Unix OS certainly provides no way to identify the
> data in a file at any level
> (conventional file "magic" notwithstanding).  The standard library does
> not provide this functionality either.  All
> the standard provides is a way of having the data converted between the
> transport (file I/O) and formatting layers so
> that regardless of how it's stored by the OS, it's seen as the character
> representation of your choice by your
> program.  You still have to make that choice explicitly, and that's what
> locales are all about.
> Stephen M. Webb

Excactly what I wanted to know (and what I feared). So I have to do it the
usual hard way. Maybe some of the special attributes of some file systems
or protocol headers may help. Thank you for the help.

Jan Schukat

Sent through GMX FreeMail -

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]