config/locale/generic/ctype_members.cc initializes _M_bit like so: for (size_t __i = 0; __i <= 15; ++__i) { _M_bit[__i] = static_cast<mask>(1 << __i); _M_wmask[__i] = _M_convert_to_wmask(_M_bit[__i]); } This assumes that ctype_base::mask is at least 16 bits. Each element of _M_bit has a single bit set, and is compared to the defined mask constants in _M_convert_to_wmask: ctype<wchar_t>::__wmask_type ctype<wchar_t>::_M_convert_to_wmask(const mask __m) const throw() { __wmask_type __ret; switch (__m) { // ... case xdigit: __ret = wctype("xdigit"); break; case alnum: __ret = wctype("alnum"); break; case graph: __ret = wctype("graph"); break; default: __ret = __wmask_type(); } return __ret; }; If any of the mask constants has more than one bit set it will never be matched. e.g. on NetBSD xdigit can never be matched for ctype<wchar_t> because config/os/bsd/netbsd/ctype_base.h defines the following typedef unsigned char mask; static const mask xdigit = _N | _X; As a result, this valid program aborts: #include <locale> #include <assert.h> class gnu_ctype: public std::ctype<wchar_t> { }; int main() { gnu_ctype gctype; assert(gctype.is(std::ctype_base::xdigit, L'a')); }
Maybe I'm misunderstanding the tone of your PR, and certainly Benjamin knows better than me because he invented this stuff, but I don't think the blame should be on the generic configuration per se, in other terms it isn't supposed to be fully generic and covering all possible situations. I think the general idea was providing something covering a good range of cases and then making easy addind new ones covering the special needs of the various targets. I think that, for now at least, should. E done for netbsd too.
Sorry for the tone, it might be that this is only really broken on netbsd, and maybe using the ieee model there would fix the test failure - I need to investigate further
I see the actual problem now, it's not as bad as I initially thought (it was late at night when I started debugging this!) The code in generic/ctype_members.cc assumes that each ctype_base::mask constant will either be a single bit, or be the bitwise-or of other ctype_base::mask constants, which works because e.g. if alnum is upper|lower|digit, then do_is(alnum, L'c') will match on lower. But netbsd defines ctype_base::xdigit as _N|_X where _N is ctype_base::digit but _X corresponds to [A-Fa-f] and is not used in any other ctype_base::mask constant, so the wide characters [A-Fa-f] cannot be matched by xdigit. newlib similarly uses _X|_N for xdigit, but has --enable-clocale=newlib so doesn't use the generic code. bionic uses _X|_N but I don't think it supports wchar_t anyway I believe vxworks and qnx would fail to match is(ctype::space, L' ') and is(ctype::print, L' ') for similar reasons, if they support wchar_t
In config/locale/newlib/ctype_members.cc: default: // Different from the generic version, xdigit and print in // newlib are defined as bitwise-OR result of bitmasks: // xdigit = _X | _N; // print = _P | _U | _L | _N | _B; // in which _X and _B don't correspond to any ctype mask. // In order to get the wmask correctly converted when __m is // equal to _X or _B, the two cases are specifically handled // here. if (__m & xdigit) __ret = wctype("xdigit"); else if (__m & print) __ret = wctype("print"); else __ret = __wmask_type(); } This doesn't only apply to the masks defined in config/os/newlib/ctype_base.h, but also netbsd and openbsd and bionic. Something similar is also needed for qnx, as config/os/qnx/qnx6.1/ctype_base.h uses several bitmasks that do not correspond to any ctype, e.g. _XA, _SP, _XS, _XB