This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
UTF-8, UTF-16 and UTF-32
- From: "Dallas Clarke" <DClarke at unwired dot com dot au>
- To: <gcc-help at gcc dot gnu dot org>
- Date: Thu, 21 Aug 2008 14:43:19 +1000
- Subject: UTF-8, UTF-16 and UTF-32
Hello GCC,
Now I have had the time to pull myself off the ceiling, I realise the
problem is that Unix/GCC is supporting both UTF-8 and UTF-32, while Windows
is supporting UTF-8 and UTF-16. And the solution is for both Unix and
Windows to support all three Unicode formats.
I have had to spend the last several days totally writing from scratch the
UTF-16 string functions, and realise that with a bit of common sense every
thing can work out okay. Hopefully quick action to move wchar_t to 2 bytes
and create another type for 4 byte strings, we can see a lot of problems
solved. Maybe have UTF-16 strings with L"My String" and UTF-32 with LL"My
String" notations.
I hope your steering committee can see that there will be lots of UTF-16
text files out there, with a lot of code required to be written to process
those files and while UTF-8 will not support many none Latin based
languages, UTF-32 will not support many none Human base languages - i.e. no
signal system is fault free.
Thanks,
Dallas
http://www.ekkySoftware.com/