This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: contrib/gcc_update in non-english locales

On 04/22/2010 09:21 AM, Dave Korn wrote:
On 22/04/2010 06:17, Basile Starynkevitch wrote:
On 04/21/2010 06:25 PM, Basile Starynkevitch wrote:

The contrib/gcc_update script does not seem to work in non-english
locales (e.g. in a French UTF8 locale on Linux).

You never told us what the problem actually is

But I know. :-) See the links that I posted when I read the reverse patch from Basile.

There are two issues:

1) collation. [A-Z] and [a-z] match both uppercase and lowercase (except respectively "a" and "Z") because the collection order is aAbBcCdD...zZ. In some locales they might even match a group of >1 letters, e.g. ch or ll in Spanish. This is the most common issue.

2) encoding. In UTF-8, . would match only valid UTF-8 characters and again could match more than one byte. Furthermore, the meaning of [à] is different in ASCII and UTF-8: again, UTF-8 would match more than one byte in this case and only in the specific order 0xc3-0xa0, while ASCII would match "half" of the UTF-8-encoded character only, and would match an unpaired 0xa0 too (invalid UTF-8, but e.g. valid Latin-1). This is rarely a problem.


... that if it's the encoding, that patch won't solve the problem on systems where the default encoding for the C locale is UTF-8 anyway; it would need to be something like this:


No, C.ASCII is only supported by Cygwin as far as I know. It works under glibc only by virtue of being an unknown locale (so that you end up using the default, which is C), but it's not portable.

C.UTF-8 in particular does not work under glibc (and making it the default in Cygwin was a veeeeery bad idea).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]