This is the mail archive of the libstdc++@sourceware.cygnus.com mailing list for the libstdc++ project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
On May 18, 1999 5:57 PM, Edwards, Phil [SMTP:pedwards@ball.com] wrote: > > + basic_string<T>::operator +=(T c); > + This operation will only work for ascii chars which are one > + byte long in UTF-8. > > Why? The very first paragraph of the strings clause states: > > # This clause describes components for manipulating sequences of > # "characters," where characters may be of any POD (3.9) type. In this > # clause such types are called charlike types, and objects of char > # like types are called charlike objects or simply "characters." > > As long as whatever you pick for T (when instantiating basic_string) is a > POD type, then op+= is defined to work. There is simply no correct POD type for UTF-8, since an UTF-8 character is of varying length (in bytes). If you use something like a POD of max UTF-8 bytes, c_str() won't mean an UTF-8 string ... > > > + basic_string<T>::reference basic_string<T>::operator[](size_type pos); > + This operation is meaningless for anything but a pure ascii string ... > > Why? I spoke about the UTF-8 case and supposingly having the only basic_string<> instantiation that have sense with UTF-8 : any POD type that is a byte. Therefore, suppose you have a two byte long (or more) character , you'll never get the entire character . Ascii chars works because they have the amazing property of being one byte long in the UTF-8 encoding . I don't mean it is impossible to use basic_string<> to operate on UTF-8, I just mean it is dangerous and bug-prone . > > + You can get any byte in a character and it would be an unit > + of storage not necessarily a character. > > basic_string<T>::reference is of type T&, whatever that may mean. It does > not have to be a single byte in size. No, you're right, but, since no POD can represent an UTF-8 character (with its varying length), you have to use a byte-based storage like a basic_string of bytes. > > > + Finally, adding support for an UTF-8 string is far beyond the > + scope of a > + standart C++ library. > > We are in agreement there. But there's no reason why it couldn't be done as > an extension, or even (shameless plug) a HOWTO. Because of intellectual property laws, all I can do is try to share some experiences and thoughts on the matter. ( With no time and budget , of course ! ) > > > Luck++; > Phil ++Glück (auf Deutsch) Chris
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |