This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UTF-8, UTF-16 and UTF-32


Hello Scott,

I guess that ASCII would be char, UTF-8 would be unsigned char, UTF-16 would be wchar_t and UTF-32 would be long wchar_t. But it is more appropriate just to have the three sizes of strings, i.e. 8-bits, 16-bits and 32 bits, and the ability to have const 16-bit strings.

wchar_t* strchr(wchar_t *string, wchar_t chr){
   while(*string != '\0' && *string != chr) ++string;
   if(*string == chr) return string;
   return NULL;
}

const wchar_t* strchr(const wchar_t *string, wchar_t chr){
   while(*string != '\0' && *string != chr) ++string;
   if(*string == chr) return string;
   return NULL;
}

Cheers,
Dallas.
http://www.ekkySoftware.com/

----- Original Message ----- From: "me22" <me22.ca@gmail.com>
To: "Dallas Clarke" <DClarke@unwired.com.au>
Cc: "Eljay Love-Jensen" <eljay@adobe.com>; "GCC-help" <gcc-help@gcc.gnu.org>
Sent: Saturday, August 23, 2008 12:12 PM
Subject: Re: UTF-8, UTF-16 and UTF-32



On Fri, Aug 22, 2008 at 21:37, Dallas Clarke <DClarke@unwired.com.au> wrote:

Standardise: - sizeof(char) = 1; sizeof(wchar_t) = 2; and sizeof(long wchar_t) = 4.


Do you mean "standardize char as UTF-8, wchar_t as UTF-16, and long wchar_t as UTF-32"? Because that's not what you said, even if (on POSIX, but not necessarily C or C++) the sizes would be appropriate.

Implement all the string functions: - strcmp(); mbscmp(); wcscmp(); and
lcscmp().


How exactly do you plan on implementing strchr for UTF-16? Specifically, what would its signature be?

~ Scott



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]