[Bug libstdc++/63776] [C++11] Regex collate matching not working
redi at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Oct 3 10:48:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63776
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |INVALID
--- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Tim Shen from comment #8)
> I don't think std::regex_match<BiIter, Alloc, char, RegexTraits> should care
> about decoding a char string to wchar_t string and call
> std::regex_match<AnotherBiIter, AnotherAlloc, wchar_t,
> std::regex_traits<wchar_t>>, leaving user defined RegexTraits potentially
> unused.
I agree.
> Instead, user can maually decode the utf-8 string (I'm sad we don't have a
> standard char iterator adaptor which converts a utf-8 char iterator to
> char32_t iterator) and call std::regex_match<..., wchar_t, ...>.
Agreed.
> These are my understanding, so it's surely possible that I may miss
> something.
>
> Thoughts?
Having looked through this again, I think you're right.
So this reduced test case is not expected to pass:
#include <regex>
#include <cassert>
int main()
{
std::locale::global(std::locale("en_US.UTF-8"));
std::string s = "joão méroço";
std::regex r{"[[:alpha:]]{4} [[:alpha:]]{6}"};
assert( regex_match(s, r) );
}
But this is (assuming wchar_t uses a unicode encoding):
#include <regex>
#include <cassert>
int main()
{
std::locale::global(std::locale("en_US.UTF-8"));
std::string s = "joão méroço";
std::regex r{"[[:alpha:]]{4} [[:alpha:]]{6}"};
assert( regex_match(s, r) );
}
More information about the Gcc-bugs
mailing list