This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libstdc++/81200] New: regex classatomcollatingelement


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81200

            Bug ID: 81200
           Summary: regex classatomcollatingelement
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hstong at ca dot ibm.com
  Target Milestone: ---

The source below attempts to compile "_[[.left-square-bracket.]]_" as a egrep
regex pattern.
The collating symbol, [.left-square-bracket.], is only valid if
"left-square-bracket" is a multi-character collating element in the locale.
"left-square-bracket" is most assuredly not a multi-character collating element
in the "POSIX" locale.

Note: WG 21 document N1429 advocates the behaviour exhibited by the
implementation; however, it appears from N1623 (relevant portion quoted below)
that the committee made corrections:
"that is a bunch of portable names for characters, which are not the same as
collating elements within the meaning of POSIX locales"

The <regex> implementation accepts the pattern (not expected), and the string
"_[_" matches (not expected).
grep -E seems to work as expected.

Online compiler: https://wandbox.org/permlink/VdEOUrcdcBqnbjLB

### SOURCE (llregex3.cc):
#include <regex>

int main(void) {
  std::regex regex;
  regex.imbue(std::locale("POSIX"));

  try {
    regex.assign("_[[.left-square-bracket.]]_", std::regex_constants::egrep);
    printf("No error.\n");

    bool b;
    b = regex_match("_[_", regex);
    printf("%s _[_.\n", b ? "Matched" : "Did not match");
  }
  catch (const std::regex_error &e) {
    if (e.code() == std::regex_constants::error_collate) {
      printf("Got error_collate.\n");
    }
    else {
      printf("Got other error.\n");
    }
  }
}

### COMPILER INVOCATION:
g++ -std=c++11 llregex3.cc -o llregex3

### PROGRAM INVOCATION AND OUTPUT:
> ./llregex3
No error.
Matched _[_.
Return:  0x00:0

### EXPECTED PROGRAM OUTPUT:
Got error_collate.

### REFERENCE BEHAVIOUR (POSIX locale; grep -E):
> ( export LANG=POSIX; locale && grep -E '_[[.left-square-bracket.]]_' )
LANG=POSIX
LANGUAGE=en_US:en
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
grep: Invalid collation character
Return:  0x02:2

### COMPILER VERSION INFO (g++ -v):
Using built-in specs.
COLLECT_GCC=/opt/wandbox/gcc-head/bin/g++
COLLECT_LTO_WRAPPER=/opt/wandbox/gcc-head/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../source/configure --prefix=/opt/wandbox/gcc-head
--enable-languages=c,c++ --disable-multilib --without-ppl --without-cloog-ppl
--enable-checking=release --disable-nls --enable-lto
LDFLAGS=-Wl,-rpath,/opt/wandbox/gcc-head/lib,-rpath,/opt/wandbox/gcc-head/lib64,-rpath,/opt/wandbox/gcc-head/lib32
Thread model: posix
gcc version 8.0.0 20170623 (experimental) (GCC)

### grep --version:
grep (GNU grep) 2.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see
<http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]