This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libstdc++/71500] regex::icase only works on first character in a range
- From: "timshen at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 11 Jun 2016 22:27:52 +0000
- Subject: [Bug libstdc++/71500] regex::icase only works on first character in a range
- Auto-submitted: auto-generated
- References: <bug-71500-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71500
--- Comment #6 from Tim Shen <timshen at gcc dot gnu.org> ---
(In reply to mwd from comment #5)
> All of the ECMAScript engines I have found work with this , and the
> ECMAScript specs seem to imply that this should work as well.
I think you are right.
According to ECMAScript 262 6th edition [21.2.2.8.1]:
5. If invert is false, then
a. If there does not exist a member a of set A such that Canonicalize(a) is
cc, return failure.
, when creating the CharSet A, ignorecase is not taken into consideration at
all; rather, when matching a character against a range, both the character and
the candidate from the range is "Canonialized" and compared.
Ideally I would implement this as (which is similar to your suggestion):
auto toggle_case(auto c) {
if (isupper(c)) return tolower(c);
if (islower(c)) return toupper(c);
return c;
}
if (range.first <= c && c < range.second) { ... }
else if (icase && (range.first <= toggle_case(c) && toggle_case(c) <
range.second)) { ... }
The problem is I'm not sure how to implement this, since:
When collate is off, locale related function like tolower and toupper should
not be called; when collate is on, everything should go through
regex_traits::transform, which doesn't care about icase.
libc++ 3.8.0 accepts the regex, but prints:
aaa : Nope
AAA : Nope
fff : Nope
FFF : Nope
ttt : Nope
TTT : Nope
uuu : Nope
UUU : Nope
ggg : Nope
GGG : Nope
which is similar to what I did previously.
I think this is a design flaw in regex_traits, and the Boost behavior is
slightly better (although all of us are broken :P).