Bug 58628 - Incorrect std::isalpha results with UTF-8 locale on illumos
Summary: Incorrect std::isalpha results with UTF-8 locale on illumos
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.4.7
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2013-10-05 08:39 UTC by Alexander Pyhalov
Modified: 2022-09-16 18:16 UTC (History)
3 users (show)

See Also:
Known to work:
Known to fail:
Last reconfirmed: 2022-02-24 00:00:00

ctype test (272 bytes, text/x-c++src)
2013-10-05 08:40 UTC, Alexander Pyhalov

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Pyhalov 2013-10-05 08:39:09 UTC
The following example when compiled with gcc 3.4,4.7 or 4.4  and launched with en_US.UTF-8 locale on illumos says that char with 196 code is alphabetic, however it's not a correct UTF-8 symbol.

$ env LANG=en_US.UTF-8 ./test_ctype
letter is �
(int)letter is 196

If this program is compiled with Sun Studio Compiler (CC) results are:
$ env LANG=en_US.UTF-8 ./test_ctype_CC
letter is �
(int)letter is 196

if I compile this program on Linux / FreeBSD , results are correct.

Related OpenIndiana bug report: 
Discussion on illumos-dev:
Comment 1 Alexander Pyhalov 2013-10-05 08:40:41 UTC
Created attachment 30958 [details]
ctype test
Comment 2 Eric Gallager 2022-02-24 18:28:56 UTC
3.4, 4.4, and 4.7 are pretty old at this point; does this still happen with newer versions of GCC?
Comment 3 Alexander Pyhalov 2022-09-16 18:11:29 UTC
I still see this behavior with gcc version 10.4.0.
Comment 4 Alexander Pyhalov 2022-09-16 18:16:08 UTC
If it helps, the last comment from illumos-gate bug report says

"From what I can tell ctype<wchar_t>::_M_initialize_ctype() in gcc-5.1.0/ibstdc++-v3/config/locale/generic/ctype_members.cc:248 is basically just calling btowc(i) for all i <= 0 <= 255 and storing the result. If std::locale::classic() is called before the setlocale() call in the test program, things happen to work, but apparently the initialization uses whatever the current locale is."