UTF-8, UTF-16 and UTF-32

Dallas Clarke DClarke@unwired.com.au
Sat Aug 23 02:45:00 GMT 2008


Hello Scott,

I guess that ASCII would be char, UTF-8 would be unsigned char, UTF-16 would 
be wchar_t and UTF-32 would be long wchar_t. But it is more appropriate just 
to have the three sizes of strings, i.e. 8-bits, 16-bits and 32 bits, and 
the ability to have const 16-bit strings.

wchar_t* strchr(wchar_t *string, wchar_t chr){
    while(*string != '\0' && *string != chr) ++string;
    if(*string == chr) return string;
    return NULL;
}

const wchar_t* strchr(const wchar_t *string, wchar_t chr){
    while(*string != '\0' && *string != chr) ++string;
    if(*string == chr) return string;
    return NULL;
}

Cheers,
Dallas.
http://www.ekkySoftware.com/

----- Original Message ----- 
From: "me22" <me22.ca@gmail.com>
To: "Dallas Clarke" <DClarke@unwired.com.au>
Cc: "Eljay Love-Jensen" <eljay@adobe.com>; "GCC-help" <gcc-help@gcc.gnu.org>
Sent: Saturday, August 23, 2008 12:12 PM
Subject: Re: UTF-8, UTF-16 and UTF-32


> On Fri, Aug 22, 2008 at 21:37, Dallas Clarke <DClarke@unwired.com.au> 
> wrote:
>>
>> Standardise: - sizeof(char) = 1; sizeof(wchar_t) = 2; and sizeof(long
>> wchar_t) = 4.
>>
>
> Do you mean "standardize char as UTF-8, wchar_t as UTF-16, and long
> wchar_t as UTF-32"?  Because that's not what you said, even if (on
> POSIX, but not necessarily C or C++) the sizes would be appropriate.
>
>> Implement all the string functions: - strcmp(); mbscmp(); wcscmp(); and
>> lcscmp().
>>
>
> How exactly do you plan on implementing strchr for UTF-16?
> Specifically, what would its signature be?
>
> ~ Scott
> 



More information about the Gcc-help mailing list