Bug 11740 - ctype<wchar_t>::do_is(mask, wchar_t) doesn't handle multiple bits in mask.
Summary: ctype<wchar_t>::do_is(mask, wchar_t) doesn't handle multiple bits in mask.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 3.4.0
: P2 normal
Target Milestone: 3.3.2
Assignee: Paolo Carlini
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-07-31 09:04 UTC by Pétur Runólfsson
Modified: 2003-10-07 08:48 UTC (History)
1 user (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2003-08-23 17:15:07


Attachments
Test case (587 bytes, text/plain)
2003-07-31 09:04 UTC, Pétur Runólfsson
Details
Test case (601 bytes, text/plain)
2003-08-01 09:00 UTC, Pétur Runólfsson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pétur Runólfsson 2003-07-31 09:04:22 UTC
ctype<wchar_t>::do_is(mask m, wchar_t c) assumes that m is equal to one
of the values in the ctype_base::mask enumeration. This causes do_is to
fail when more than one bit is set in m.
Comment 1 Pétur Runólfsson 2003-07-31 09:04:57 UTC
Created attachment 4531 [details]
Test case
Comment 2 Pétur Runólfsson 2003-08-01 09:00:49 UTC
Created attachment 4544 [details]
Test case

Fixed bug in original test case
Comment 3 Andrew Pinski 2003-08-23 17:15:07 UTC
Related to bug 11844.
I can confirm this on the mainline (20030823).
Comment 4 Paolo Carlini 2003-10-01 21:28:37 UTC
Working on it since related to 11844.
Comment 5 Benjamin Kosnik 2003-10-06 16:44:08 UTC
This should be fixed with the current ctype<wchar_t>::do_is code fix, as it is
one of the things tested for.... However, I'm still getting errors in this
testcase. I believe it to be incorrect...

-benjamin
Comment 6 Pétur Runólfsson 2003-10-06 16:50:58 UTC
Benjamin Kosnik wrote:
> This should be fixed with the current ctype<wchar_t>::do_is code fix, as it is
> one of the things tested for.... However, I'm still getting errors in this
> testcase. I believe it to be incorrect...

By my reading of the standard, this version of do_is should return true if
any of the bits apply to the character, but the current implementation
seems to return true only if all the bits apply.
Comment 7 Paolo Carlini 2003-10-06 18:38:46 UTC
Pétur, could you please detail a little more your reading? In 22.2.1.1.2 I don't
see any "any" ;), only an "&" in p2...
Comment 8 Pétur Runólfsson 2003-10-06 18:53:10 UTC
Paolo Carlini wrote:
> Pétur, could you please detail a little more your reading? In 22.2.1.1.2
> I don't see any "any" ;), only an "&" in p2...

This is the relevant text:
> Returns: The first form returns the result of the expression
> (M & m) != 0; i.e., true if the character has the characteristics
> specified. The second form returns high. 

(M & m) != 0 is true if any bit is set in both M and m. This is
also consistent with the specialization for char.
The current implementation returns true if (M & m) == m && m != 0.

The sentence "true if the character has the characteristics specified"
implies the current implementation. I suspect that it should read
"true if the character has any of the characteristics specified".
Comment 9 Paolo Carlini 2003-10-06 18:56:37 UTC
Ok, thanks, now I see, please tell me if I'm wrong...
This is the rationale: a character *cannot* belong simultaneously to two different 
values of the ctype_base enum: for instance cannot be, at the same time, uppercase
AND lowercase. That's why do_is must be intended to mean "any".
Comment 10 Pétur Runólfsson 2003-10-06 19:03:45 UTC
Paolo Carlini wrote:
> This is the rationale: a character *cannot* belong simultaneously to
> two different values of the ctype_base enum:

It can, for example alpha and upper.

> That's why do_is must be intended to mean "any".

The only use for this feature that I see is to be able to use
alnum and graph. ct.is(alnum, c) should return true if c if either
ct.is(digit, c) is true or ct.is(alpha, c) is true. I can't think of
any character for which both are true.

Comment 11 Paolo Carlini 2003-10-06 19:13:16 UTC
Pétur wrote:
>> This is the rationale: a character *cannot* belong simultaneously to
>> two different values of the ctype_base enum:
>
>It can, for example alpha and upper.

Yes, sorry.

>> That's why do_is must be intended to mean "any".
>
>The only use for this feature that I see is to be able to use
>alnum and graph. ct.is(alnum, c) should return true if c if either
>ct.is(digit, c) is true or ct.is(alpha, c) is true. I can't think of
>any character for which both are true.

Ok. This is the point at the root of my (too general) intuition.

Now... If I understand well you see an inconsistency in the standard
between 22.2.1.1.2 (which we already correctly implement as per your
previous message) and the definitions of alnum == alpha | digit and
graph == alnum | punct, right?
Comment 12 Paolo Carlini 2003-10-06 19:43:03 UTC
Ok, finally I see (I hope ;) !

1- Currently we do *not* implement correctly (M & m) != 0 because we do,
   incorrectly, "&=" in the loop, instead of "|=" (and __ret initialized
   false).
2- However, we *seem* to implement the standard because the standard says,
   with an imprecise wording, "true if the character has the characteristics
   specified" instead of the more correct and consistent with the logical
   definition, "true if the character has any of the characteristics specified"

I'm testing a patch along the lines of 1- above...
Comment 13 CVS Commits 2003-10-06 22:33:03 UTC
Subject: Bug 11740

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	paolo@gcc.gnu.org	2003-10-06 22:32:59

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/config/locale/gnu: ctype_members.cc 
	libstdc++-v3/config/locale/generic: ctype_members.cc 
Added files:
	libstdc++-v3/testsuite/22_locale/ctype/is/wchar_t: 11740.cc 

Log message:
	2003-10-06  Paolo Carlini  <pcarlini@unitus.it>
	
	PR libstdc++/11740
	* config/locale/gnu/ctype_members.cc (ctype<wchar_t>::do_is):
	Fix to actually return (M & m) != 0 as per 22.2.1.1.2.
	* config/locale/generic/ctype_members.cc: Same.
	* testsuite/22_locale/ctype/is/wchar_t/11740.cc: New.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&r1=1.1997&r2=1.1998
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/gnu/ctype_members.cc.diff?cvsroot=gcc&r1=1.11&r2=1.12
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/generic/ctype_members.cc.diff?cvsroot=gcc&r1=1.5&r2=1.6
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/22_locale/ctype/is/wchar_t/11740.cc.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 14 Andrew Pinski 2003-10-06 22:39:18 UTC
Fixed in 3.4.
Comment 15 CVS Commits 2003-10-07 08:41:03 UTC
Subject: Bug 11740

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_3-branch
Changes by:	paolo@gcc.gnu.org	2003-10-07 08:40:59

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/config/locale/generic: ctype_members.cc 
	libstdc++-v3/config/locale/gnu: ctype_members.cc 

Log message:
	2003-10-07  Paolo Carlini  <pcarlini@unitus.it>
	
	PR libstdc++/11740
	* config/locale/gnu/ctype_members.cc (ctype<wchar_t>::do_is):
	Fix to actually return (M & m) != 0 as per 22.2.1.1.2.
	* config/locale/generic/ctype_members.cc: Same.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.1464.2.150&r2=1.1464.2.151
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/generic/ctype_members.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.2.20.1&r2=1.2.20.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/gnu/ctype_members.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.6.4.2&r2=1.6.4.3

Comment 16 Paolo Carlini 2003-10-07 08:48:15 UTC
Fixed for 3.3.2 too!
Comment 17 CVS Commits 2005-02-07 20:34:37 UTC
Subject: Bug 11740

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	green@gcc.gnu.org	2005-02-07 20:34:18

Modified files:
	libjava        : ChangeLog 
	libjava/gnu/java/nio/charset: ISO_8859_1.java Provider.java 
	                              US_ASCII.java UTF_16.java 
	                              UTF_16BE.java UTF_16LE.java 
	                              UTF_8.java 

Log message:
	2005-02-07  Robert Schuster  <thebohemian@gmx.net>
	
	* gnu/java/nio/charset/ISO_8859_1.java,
	gnu/java/nio/charset/US_ASCII.java,
	gnu/java/nio/charset/UTF_16.java,
	gnu/java/nio/charset/UTF_16_LE.java,
	gnu/java/nio/charset/UTF_16_BE.java,
	gnu/java/nio/charset/UTF_8.java: Fixed canonical names
	and aliases according to
	"http://www.iana.org/assignments/character-sets",
	"http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html"
	and "http://oss.software.ibm.com/cgi-bin/icu/convexp?s=ALL".
	* gnu/java/nio/charset/Provider.java: Made charset lookup
	case-insensitive which fixes bug #11740.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/ChangeLog.diff?cvsroot=gcc&r1=1.3307&r2=1.3308
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/ISO_8859_1.java.diff?cvsroot=gcc&r1=1.2&r2=1.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/Provider.java.diff?cvsroot=gcc&r1=1.1&r2=1.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/US_ASCII.java.diff?cvsroot=gcc&r1=1.2&r2=1.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_16.java.diff?cvsroot=gcc&r1=1.2&r2=1.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_16BE.java.diff?cvsroot=gcc&r1=1.2&r2=1.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_16LE.java.diff?cvsroot=gcc&r1=1.2&r2=1.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/gnu/java/nio/charset/UTF_8.java.diff?cvsroot=gcc&r1=1.2&r2=1.3