ctype<wchar_t>::do_is(mask m, wchar_t c) assumes that m is equal to one of the values in the ctype_base::mask enumeration. This causes do_is to fail when more than one bit is set in m.
Created attachment 4531 [details] Test case
Created attachment 4544 [details] Test case Fixed bug in original test case
Related to bug 11844. I can confirm this on the mainline (20030823).
Working on it since related to 11844.
This should be fixed with the current ctype<wchar_t>::do_is code fix, as it is one of the things tested for.... However, I'm still getting errors in this testcase. I believe it to be incorrect... -benjamin
Benjamin Kosnik wrote: > This should be fixed with the current ctype<wchar_t>::do_is code fix, as it is > one of the things tested for.... However, I'm still getting errors in this > testcase. I believe it to be incorrect... By my reading of the standard, this version of do_is should return true if any of the bits apply to the character, but the current implementation seems to return true only if all the bits apply.
Pétur, could you please detail a little more your reading? In 22.2.1.1.2 I don't see any "any" ;), only an "&" in p2...
Paolo Carlini wrote: > Pétur, could you please detail a little more your reading? In 22.2.1.1.2 > I don't see any "any" ;), only an "&" in p2... This is the relevant text: > Returns: The first form returns the result of the expression > (M & m) != 0; i.e., true if the character has the characteristics > specified. The second form returns high. (M & m) != 0 is true if any bit is set in both M and m. This is also consistent with the specialization for char. The current implementation returns true if (M & m) == m && m != 0. The sentence "true if the character has the characteristics specified" implies the current implementation. I suspect that it should read "true if the character has any of the characteristics specified".
Ok, thanks, now I see, please tell me if I'm wrong... This is the rationale: a character *cannot* belong simultaneously to two different values of the ctype_base enum: for instance cannot be, at the same time, uppercase AND lowercase. That's why do_is must be intended to mean "any".
Paolo Carlini wrote: > This is the rationale: a character *cannot* belong simultaneously to > two different values of the ctype_base enum: It can, for example alpha and upper. > That's why do_is must be intended to mean "any". The only use for this feature that I see is to be able to use alnum and graph. ct.is(alnum, c) should return true if c if either ct.is(digit, c) is true or ct.is(alpha, c) is true. I can't think of any character for which both are true.
Pétur wrote: >> This is the rationale: a character *cannot* belong simultaneously to >> two different values of the ctype_base enum: > >It can, for example alpha and upper. Yes, sorry. >> That's why do_is must be intended to mean "any". > >The only use for this feature that I see is to be able to use >alnum and graph. ct.is(alnum, c) should return true if c if either >ct.is(digit, c) is true or ct.is(alpha, c) is true. I can't think of >any character for which both are true. Ok. This is the point at the root of my (too general) intuition. Now... If I understand well you see an inconsistency in the standard between 22.2.1.1.2 (which we already correctly implement as per your previous message) and the definitions of alnum == alpha | digit and graph == alnum | punct, right?
Ok, finally I see (I hope ;) ! 1- Currently we do *not* implement correctly (M & m) != 0 because we do, incorrectly, "&=" in the loop, instead of "|=" (and __ret initialized false). 2- However, we *seem* to implement the standard because the standard says, with an imprecise wording, "true if the character has the characteristics specified" instead of the more correct and consistent with the logical definition, "true if the character has any of the characteristics specified" I'm testing a patch along the lines of 1- above...
Subject: Bug 11740 CVSROOT: /cvs/gcc Module name: gcc Changes by: paolo@gcc.gnu.org 2003-10-06 22:32:59 Modified files: libstdc++-v3 : ChangeLog libstdc++-v3/config/locale/gnu: ctype_members.cc libstdc++-v3/config/locale/generic: ctype_members.cc Added files: libstdc++-v3/testsuite/22_locale/ctype/is/wchar_t: 11740.cc Log message: 2003-10-06 Paolo Carlini <pcarlini@unitus.it> PR libstdc++/11740 * config/locale/gnu/ctype_members.cc (ctype<wchar_t>::do_is): Fix to actually return (M & m) != 0 as per 22.2.1.1.2. * config/locale/generic/ctype_members.cc: Same. * testsuite/22_locale/ctype/is/wchar_t/11740.cc: New. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&r1=1.1997&r2=1.1998 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/gnu/ctype_members.cc.diff?cvsroot=gcc&r1=1.11&r2=1.12 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/generic/ctype_members.cc.diff?cvsroot=gcc&r1=1.5&r2=1.6 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/22_locale/ctype/is/wchar_t/11740.cc.diff?cvsroot=gcc&r1=NONE&r2=1.1
Fixed in 3.4.
Subject: Bug 11740 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_3-branch Changes by: paolo@gcc.gnu.org 2003-10-07 08:40:59 Modified files: libstdc++-v3 : ChangeLog libstdc++-v3/config/locale/generic: ctype_members.cc libstdc++-v3/config/locale/gnu: ctype_members.cc Log message: 2003-10-07 Paolo Carlini <pcarlini@unitus.it> PR libstdc++/11740 * config/locale/gnu/ctype_members.cc (ctype<wchar_t>::do_is): Fix to actually return (M & m) != 0 as per 22.2.1.1.2. * config/locale/generic/ctype_members.cc: Same. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.1464.2.150&r2=1.1464.2.151 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/generic/ctype_members.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.2.20.1&r2=1.2.20.2 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/gnu/ctype_members.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.6.4.2&r2=1.6.4.3
Fixed for 3.3.2 too!
Subject: Bug 11740 CVSROOT: /cvs/gcc Module name: gcc Changes by: green@gcc.gnu.org 2005-02-07 20:34:18 Modified files: libjava : ChangeLog libjava/gnu/java/nio/charset: ISO_8859_1.java Provider.java US_ASCII.java UTF_16.java UTF_16BE.java UTF_16LE.java UTF_8.java Log message: 2005-02-07 Robert Schuster <thebohemian@gmx.net> * gnu/java/nio/charset/ISO_8859_1.java, gnu/java/nio/charset/US_ASCII.java, gnu/java/nio/charset/UTF_16.java, gnu/java/nio/charset/UTF_16_LE.java, gnu/java/nio/charset/UTF_16_BE.java, gnu/java/nio/charset/UTF_8.java: Fixed canonical names and aliases according to "http://www.iana.org/assignments/character-sets", "http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html" and "http://oss.software.ibm.com/cgi-bin/icu/convexp?s=ALL". * gnu/java/nio/charset/Provider.java: Made charset lookup case-insensitive which fixes bug #11740. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/ChangeLog.diff?cvsroot=gcc&r1=1.3307&r2=1.3308 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/ISO_8859_1.java.diff?cvsroot=gcc&r1=1.2&r2=1.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/Provider.java.diff?cvsroot=gcc&r1=1.1&r2=1.2 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/US_ASCII.java.diff?cvsroot=gcc&r1=1.2&r2=1.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_16.java.diff?cvsroot=gcc&r1=1.2&r2=1.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_16BE.java.diff?cvsroot=gcc&r1=1.2&r2=1.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_16LE.java.diff?cvsroot=gcc&r1=1.2&r2=1.3 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_8.java.diff?cvsroot=gcc&r1=1.2&r2=1.3