This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Filenames with accented characters


João Garcia wrote:
> please do not take what I am about to write as a personal attack. Be 
> sure that I am only trying to help (although I may be failing to do so...).

Rest assured that I am not taking it as a personal attack
and I do not see anything in your mail that looks like it.
(So I'm confused by what you're saying here...)


> my posts are misinterpreted (my fault, I apologize for that)... And I 
> wonder if this is the reason for something so simple as character 
> conversions on filenames to take many months to get solved in gcj/mingw

It could simply be sloth, oversight or plain "not my problem".
One must persist and nag till one gets a satisfactory (either
way) reply.


>>This is, as usual, an ugly beast. The primary
>>issue here is that Windows NT based OSs (NT4/2K/XP)
>>have the notion of both a System Locale and a
>>User Locale, which are almost, *but not quite*,
>>of the same status.
>>
> For what that matters in character conversions, you must be referring to:
> 1 - CP_ACP (for "old" win32 applications).
> 2 - CP_OEMCP (for "ancient" DOS applications).

No, I was referring to what you can modify using
Control Panel -> Regional Settings -> General (this is
the User Locale - "Set default..." here sets the
System Locale. This is as seen on Win2K.)


> If your compiler honors the runtime values of CP_ACP and CP_OEMCP, you 
> can use them (it does not seem to be the case for gcc). Otherwise please 
> use GetACP() or GetOEMCP().

I did not want to work with code pages and wanted to stick
to UTF-8. (I didn't understand how the compiler can honour
the *runtime* values of CP_ACP, etc. - it must be for the runtime
to bother with, in which case it is MSVCRT for MinGW.)


> The native character support in the Win32 NT-branch is based on the 
> wchar_t C/C++ type using UTF-16 (LE) encoding. This includes the API 
> W-functions...

Agreed.


> (And AFAIK gcj Java-Strings are also based in wchar_t C/C++ type using 
> UTF-16 (LE) encoding).

Not too sure - it could be UTF-16 (BE) on Solaris
for example.


>>Specifically, console applications can only
>>display those glyphs that are supported by
>>the character set of the System Locale, irrespective
>>of what the User Locale is set to. GUI applications
>>fortunately do not share this problem.
>>
>> 
>>
> 
> The rxvt terminal emulation used with Msys seems to have a "bug" and 
> seems to use CP_ACP (it should use CP_OEMCP to be equivalent to the 
> windows console).

rxvt is really a GUI application, though it looks like a command
prompt, so it would not suffer from the problem that our
console application was facing.


>>Inspite of the above, applications must still honour
>>the User Locale and list the above as a known
>>limitation of the OS itself.
>>
>> 
>>
> 
> Does this situation result from an  implementation choice or from a 
> *real* implementation limitation?

The latter. See:

http://www.microsoft.com/globaldev/reference/localetable.mspx#systemlocale
http://www.microsoft.com/globaldev/reference/localetable.mspx#UserLocale

for the difference and the reason for the problem.


> I have to call your attention for some facts here... again, please do 
> not take this personally. ;-)
> 1 - wchar_t is NOT an encoding. It is a C/C++ type (I am sure you are 
> aware of this, but please do not compare it to UTF-16 because the later 
> is an encoding -- we can implement UTF-8 encoding using wchar_t as 
> supporting type, although this is not a great idea...).

Yes, but I would like to point out that wchar_t is a 4 byte
type on Solaris 8 and is a 2 byte type on WinNT4/2K/XP. For
a character value to be stored into these bytes, you *have*
to use some encoding.


> 2 - all windows API W-functions that I can recall use wchar_t at some 
> point (this must include WriteConsoleW())...

True.


> It is also a fact that you should be able to convert directly form 
> char-type UTF-8 to (and from) wchar_t-type UTF-16 using W-functions (or 
> your own functions). You should not need to know CP_ACP or CP_OEMCP to 
> do this in the win32 NT-branch!!! Search the win32 documentation at MS 
> and you should find the W-functions you need (but IMHO there is nothing 
> wrong with your previous strategy -- it can even be useful in some cases).

Maybe, but when you are working a corporate environment,
"a solution" is most of the times taken compared to "the
best solution", which takes way too much effort. :-)
(Dilbertesque, eh?)


> But Mohan has the "Binary Power" in this matter as far as I know, so 
> it's Mohan's rules... ;-)

Well, you're free to download the source, configure and build
it as well - all the instructions are out there. :-)


> Unless someone with "Source-code Power" (I don't know who s/he might be) 

If you submit a patch that fixes a problem and the patch
is approved after review, someone or the other is usually
nice enough to commit it in. (*All* my patches have been
checked in like this.)

Ranjit.

-- 
Ranjit Mathew          Email: rmathew AT hotmail DOT com

Bangalore, INDIA.      Web: http://ranjitmathew.tripod.com/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]