This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: Filenames with accented characters
João Garcia wrote:
> please do not take what I am about to write as a personal attack. Be
> sure that I am only trying to help (although I may be failing to do so...).
Rest assured that I am not taking it as a personal attack
and I do not see anything in your mail that looks like it.
(So I'm confused by what you're saying here...)
> my posts are misinterpreted (my fault, I apologize for that)... And I
> wonder if this is the reason for something so simple as character
> conversions on filenames to take many months to get solved in gcj/mingw
It could simply be sloth, oversight or plain "not my problem".
One must persist and nag till one gets a satisfactory (either
way) reply.
>>This is, as usual, an ugly beast. The primary
>>issue here is that Windows NT based OSs (NT4/2K/XP)
>>have the notion of both a System Locale and a
>>User Locale, which are almost, *but not quite*,
>>of the same status.
>>
> For what that matters in character conversions, you must be referring to:
> 1 - CP_ACP (for "old" win32 applications).
> 2 - CP_OEMCP (for "ancient" DOS applications).
No, I was referring to what you can modify using
Control Panel -> Regional Settings -> General (this is
the User Locale - "Set default..." here sets the
System Locale. This is as seen on Win2K.)
> If your compiler honors the runtime values of CP_ACP and CP_OEMCP, you
> can use them (it does not seem to be the case for gcc). Otherwise please
> use GetACP() or GetOEMCP().
I did not want to work with code pages and wanted to stick
to UTF-8. (I didn't understand how the compiler can honour
the *runtime* values of CP_ACP, etc. - it must be for the runtime
to bother with, in which case it is MSVCRT for MinGW.)
> The native character support in the Win32 NT-branch is based on the
> wchar_t C/C++ type using UTF-16 (LE) encoding. This includes the API
> W-functions...
Agreed.
> (And AFAIK gcj Java-Strings are also based in wchar_t C/C++ type using
> UTF-16 (LE) encoding).
Not too sure - it could be UTF-16 (BE) on Solaris
for example.
>>Specifically, console applications can only
>>display those glyphs that are supported by
>>the character set of the System Locale, irrespective
>>of what the User Locale is set to. GUI applications
>>fortunately do not share this problem.
>>
>>
>>
>
> The rxvt terminal emulation used with Msys seems to have a "bug" and
> seems to use CP_ACP (it should use CP_OEMCP to be equivalent to the
> windows console).
rxvt is really a GUI application, though it looks like a command
prompt, so it would not suffer from the problem that our
console application was facing.
>>Inspite of the above, applications must still honour
>>the User Locale and list the above as a known
>>limitation of the OS itself.
>>
>>
>>
>
> Does this situation result from an implementation choice or from a
> *real* implementation limitation?
The latter. See:
http://www.microsoft.com/globaldev/reference/localetable.mspx#systemlocale
http://www.microsoft.com/globaldev/reference/localetable.mspx#UserLocale
for the difference and the reason for the problem.
> I have to call your attention for some facts here... again, please do
> not take this personally. ;-)
> 1 - wchar_t is NOT an encoding. It is a C/C++ type (I am sure you are
> aware of this, but please do not compare it to UTF-16 because the later
> is an encoding -- we can implement UTF-8 encoding using wchar_t as
> supporting type, although this is not a great idea...).
Yes, but I would like to point out that wchar_t is a 4 byte
type on Solaris 8 and is a 2 byte type on WinNT4/2K/XP. For
a character value to be stored into these bytes, you *have*
to use some encoding.
> 2 - all windows API W-functions that I can recall use wchar_t at some
> point (this must include WriteConsoleW())...
True.
> It is also a fact that you should be able to convert directly form
> char-type UTF-8 to (and from) wchar_t-type UTF-16 using W-functions (or
> your own functions). You should not need to know CP_ACP or CP_OEMCP to
> do this in the win32 NT-branch!!! Search the win32 documentation at MS
> and you should find the W-functions you need (but IMHO there is nothing
> wrong with your previous strategy -- it can even be useful in some cases).
Maybe, but when you are working a corporate environment,
"a solution" is most of the times taken compared to "the
best solution", which takes way too much effort. :-)
(Dilbertesque, eh?)
> But Mohan has the "Binary Power" in this matter as far as I know, so
> it's Mohan's rules... ;-)
Well, you're free to download the source, configure and build
it as well - all the instructions are out there. :-)
> Unless someone with "Source-code Power" (I don't know who s/he might be)
If you submit a patch that fixes a problem and the patch
is approved after review, someone or the other is usually
nice enough to commit it in. (*All* my patches have been
checked in like this.)
Ranjit.
--
Ranjit Mathew Email: rmathew AT hotmail DOT com
Bangalore, INDIA. Web: http://ranjitmathew.tripod.com/