This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libstdc++/81200] New: regex classatomcollatingelement
- From: "hstong at ca dot ibm.com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 25 Jun 2017 11:58:04 +0000
- Subject: [Bug libstdc++/81200] New: regex classatomcollatingelement
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81200
Bug ID: 81200
Summary: regex classatomcollatingelement
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: hstong at ca dot ibm.com
Target Milestone: ---
The source below attempts to compile "_[[.left-square-bracket.]]_" as a egrep
regex pattern.
The collating symbol, [.left-square-bracket.], is only valid if
"left-square-bracket" is a multi-character collating element in the locale.
"left-square-bracket" is most assuredly not a multi-character collating element
in the "POSIX" locale.
Note: WG 21 document N1429 advocates the behaviour exhibited by the
implementation; however, it appears from N1623 (relevant portion quoted below)
that the committee made corrections:
"that is a bunch of portable names for characters, which are not the same as
collating elements within the meaning of POSIX locales"
The <regex> implementation accepts the pattern (not expected), and the string
"_[_" matches (not expected).
grep -E seems to work as expected.
Online compiler: https://wandbox.org/permlink/VdEOUrcdcBqnbjLB
### SOURCE (llregex3.cc):
#include <regex>
int main(void) {
std::regex regex;
regex.imbue(std::locale("POSIX"));
try {
regex.assign("_[[.left-square-bracket.]]_", std::regex_constants::egrep);
printf("No error.\n");
bool b;
b = regex_match("_[_", regex);
printf("%s _[_.\n", b ? "Matched" : "Did not match");
}
catch (const std::regex_error &e) {
if (e.code() == std::regex_constants::error_collate) {
printf("Got error_collate.\n");
}
else {
printf("Got other error.\n");
}
}
}
### COMPILER INVOCATION:
g++ -std=c++11 llregex3.cc -o llregex3
### PROGRAM INVOCATION AND OUTPUT:
> ./llregex3
No error.
Matched _[_.
Return: 0x00:0
### EXPECTED PROGRAM OUTPUT:
Got error_collate.
### REFERENCE BEHAVIOUR (POSIX locale; grep -E):
> ( export LANG=POSIX; locale && grep -E '_[[.left-square-bracket.]]_' )
LANG=POSIX
LANGUAGE=en_US:en
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
grep: Invalid collation character
Return: 0x02:2
### COMPILER VERSION INFO (g++ -v):
Using built-in specs.
COLLECT_GCC=/opt/wandbox/gcc-head/bin/g++
COLLECT_LTO_WRAPPER=/opt/wandbox/gcc-head/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../source/configure --prefix=/opt/wandbox/gcc-head
--enable-languages=c,c++ --disable-multilib --without-ppl --without-cloog-ppl
--enable-checking=release --disable-nls --enable-lto
LDFLAGS=-Wl,-rpath,/opt/wandbox/gcc-head/lib,-rpath,/opt/wandbox/gcc-head/lib64,-rpath,/opt/wandbox/gcc-head/lib32
Thread model: posix
gcc version 8.0.0 20170623 (experimental) (GCC)
### grep --version:
grep (GNU grep) 2.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see
<http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.