Bug 61227 - [C++11] Regex [\w] does not work
Summary: [C++11] Regex [\w] does not work
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: 4.9.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-19 10:07 UTC by Lyberta
Modified: 2014-06-03 17:28 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-05-19 00:00:00


Attachments
Code sample (231 bytes, text/plain)
2014-05-19 10:07 UTC, Lyberta
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lyberta 2014-05-19 10:07:58 UTC
Created attachment 32819 [details]
Code sample

The attached code produces std::regex_error in g++ 4.9.0.

Here's what debugger says:
Debugger name and version: GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1.1)
In __cxa_throw () (/usr/lib/x86_64-linux-gnu/libstdc++.so.6)
#2  0x0000000000410380 in std::__detail::_Compiler<std::regex_traits<char> >::_M_expression_term<false, false> (this=0x7fffffffe3f0, __matcher=...) at /usr/include/c++/4.9/bits/regex_compiler.tcc:455
/usr/include/c++/4.9/bits/regex_compiler.tcc:455:13755:beg:0x410380
At /usr/include/c++/4.9/bits/regex_compiler.tcc:455
Comment 1 Jonathan Wakely 2014-05-19 12:23:41 UTC
Only your first and last regexes give an error, it would be helpful if you said what you expect them to do.


The sequence \w is intepreted as [_[:alnum:]] but is rejected inside a bracket  expression. 

Reduced:

#include <regex>
int main()
{
  std::regex{ R"([\w])" };
}

For this to match a string such as "w" or "\\" it should be R"([\\w])"

To use the special \w class you can use [_[:alnum:]] as a workaround.


Tim, could you take a look at this please?

I don't think the C++ standard is clear, but Perl does interpret [\w] as equivalent to just \w so I think we should do the same.
Comment 2 Andreas Schwab 2014-05-19 12:38:20 UTC
The C++ standard refers to ECMA-262 which defines [\w] as Perl does.
Comment 3 Jonathan Wakely 2014-05-19 13:40:43 UTC
So it does, thanks, Andreas. I read C++11 [re.grammar]/7 as saying those classes are part of the changes to the ECMAScript spec.
Comment 4 Lyberta 2014-05-19 14:34:56 UTC
The first regex is used to find illegal characters in symbol name in my project. The last regex is used to tokenize command line arguments. Those regexes work in Visual Studio 2012/2013 and clang with libc++.
Comment 5 Tim Shen 2014-05-20 04:32:27 UTC
Author: timshen
Date: Tue May 20 04:31:54 2014
New Revision: 210630

URL: http://gcc.gnu.org/viewcvs?rev=210630&root=gcc&view=rev
Log:
2014-05-20  Tim Shen  <timshen91@gmail.com>

	PR libstdc++/61227
	* include/bits/regex_compiler.h
	(_BracketMatcher<>::_M_add_character_class): Add negative character
	class support.
	* include/bits/regex_compiler.tcc (_BracketMatcher<>::_M_apply):
	Likewise.
	* testsuite/28_regex/algorithms/regex_match/ecma/char/quoted_char.cc:
	Add more testcases.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/regex_compiler.h
    trunk/libstdc++-v3/include/bits/regex_compiler.tcc
    trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/quoted_char.cc
Comment 6 Lyberta 2014-06-03 14:56:33 UTC
Hi. I'm using Debain Testing and today according to changelog:
* Update to SVN 20140527 (r210956) from the gcc-4_9-branch.

The bug still persists.
Comment 7 Jonathan Wakely 2014-06-03 16:34:35 UTC
Because nothing changed on the 4.9 branch
Comment 8 Jonathan Wakely 2014-06-03 17:26:57 UTC
Author: redi
Date: Tue Jun  3 17:26:24 2014
New Revision: 211192

URL: http://gcc.gnu.org/viewcvs?rev=211192&root=gcc&view=rev
Log:
Backport from mainline
2014-05-20  Tim Shen  <timshen91@gmail.com>

	PR libstdc++/61227
	* include/bits/regex_compiler.h
	(_BracketMatcher<>::_M_add_character_class): Add negative character
	class support.
	* include/bits/regex_compiler.tcc (_BracketMatcher<>::_M_apply):
	Likewise.
	* testsuite/28_regex/algorithms/regex_match/ecma/char/quoted_char.cc:
	Add more testcases.

Modified:
    branches/gcc-4_9-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_9-branch/libstdc++-v3/include/bits/regex_compiler.h
    branches/gcc-4_9-branch/libstdc++-v3/include/bits/regex_compiler.tcc
    branches/gcc-4_9-branch/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/quoted_char.cc
Comment 9 Jonathan Wakely 2014-06-03 17:28:14 UTC
Now it's fixed on the 4.9 branch.