Regex backend refactoring/rewriting?

Tim Shen timshen@google.com
Thu Feb 19 20:49:00 GMT 2015


On Thu, Feb 19, 2015 at 3:35 AM, Stephen M. Webb
<stephen.webb@bregmasoft.ca> wrote:
> But, a leftmesot longest match would have consumes all the input with the fist "a*", leaving an empty match for the
> second "a*" and thus an empty capture.  Consider what "(a*)(a*)" would print for the same input.

Ok, I found that boost interprets this differently from glibc's regexec:
http://www.boost.org/doc/libs/1_56_0/libs/regex/doc/html/boost_regex/syntax/leftmost_longest_rule.html

My example it follows boost's definition (and result).

glibc implementation:

    #include <regex.h>
    #include <iostream>

    int main() {
      regex_t re;
      regcomp(&re, "a*(a*)", REG_EXTENDED);
      regmatch_t m[2];
      const char s[] = "aaaaaaaa";
      regexec(&re, s, 2, m, 0);
      for (auto it : m) {
        for (int i = it.rm_so; i < it.rm_eo; i++) {
          std::cout << s[i];
        }
        std::cout << "\n";
      }

      return 0;
    }

in this case the second subpattern "(a*)" captures nothing, the same
as you described.

Considering POSIX standard "Consistent with the whole match being the
longest of the leftmost matches, each subpattern, from left to right,
shall match the longest possible string.", where the first a* is not a
subpattern, but the second (a*) is. So I think boost interprets it
correctly.


-- 
Regards,
Tim Shen



More information about the Libstdc++ mailing list