This is the mail archive of the libstdc++@sources.redhat.com mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: character encoding (locale)


> On Tue, 12 Dec 2000, Jan Schukat wrote:
> >
> > Can a program in any way determined by the standard get the character
> > encoding of a file in a just opened stream. I don't mean the locales
> in ios or
> > the connected streambuf. I mean how do you get the encoding of a plain
> text
> > file.
> 
> In Unix there is no such thing as a plain text file (all files are
> simply seen as a consecutive stream of bytes). 
> Oh, sure, programs can interpret the data in a file as text, and
> conventions have developed that the LF character
> (0x0a) delimits end-of-line in such a file when representing text using
> a multibyte character encoding, but that's
> not supported by the OS or its philosophy. The C++ standard reflects
> this worldview, and explicitly supports only
> byte-oriented operations on a file.  While the IOStreams library
> supports wide character representation in stream
> buffers (such as wfstream), the underlying transport is byte-oriented.
> 
> The standard address character data representation (multibyte (char) vs.
> wide characters (wchar_t)) and file transport
> (bytes) but the interpretation of data (character encoding) is not a
> part of the standard.  To that end, the C++
> standard library, and as far as I can tell the g++-v3, offers no
> built-in support for particular interpretations of
> data from files.  The Unix OS certainly provides no way to identify the
> data in a file at any level
> (conventional file "magic" notwithstanding).  The standard library does
> not provide this functionality either.  All
> the standard provides is a way of having the data converted between the
> transport (file I/O) and formatting layers so
> that regardless of how it's stored by the OS, it's seen as the character
> representation of your choice by your
> program.  You still have to make that choice explicitly, and that's what
> locales are all about.
> 
> 
> Stephen M. Webb

Excactly what I wanted to know (and what I feared). So I have to do it the
usual hard way. Maybe some of the special attributes of some file systems
or protocol headers may help. Thank you for the help.

Jan Schukat

-- 
Sent through GMX FreeMail - http://www.gmx.net

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]