[Bug libstdc++/85472] Regex match bug
haberg-1 at telia dot com
gcc-bugzilla@gcc.gnu.org
Wed Apr 25 12:54:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
--- Comment #8 from Hans Åberg <haberg-1 at telia dot com> ---
(In reply to Tim Shen from comment #5)
> I'm not following the meaning of "action number" and "the partial reverse
> NFA is recorded".
>
> How many actions numbers are recorded? for regex_match(s, regex(re)), is it
> on the order of O(s.size()), or O(re.size())? The goal is to allocate
> O(re.size()) of memory exactly once.
Let's check the OP example, to see if it is useful, which will also explain the
action numbers. So for regex '(z)((a+)?(b+)?(c))*', I used '(z)(a*b*(c))*' with
action numbers same as in the C++ library numbering. Here they are written out
with # in the NFA:
ex = ({0}, {0, 3},
{0: {z ↦ {1, ∅}}[#1]; 1: {a ↦ {1, 2, 3}}[#2#3];
2: {b ↦ {2, 3}}[#2#4]; 3: {c ↦ {3, 2, 1, ∅}}[#2#5]}, {})
with start set {0}, and set of last (pointing to a final state ∅) states {0,
3}. (The last {} is the not yet computed DFA.)
Then lexing:
> zaacbbbcac
Full string is a valid match!
Matches: {zaacbbbcac, [@0@3@7@9];
[#1]
[#1][#2#3][#2#3][#2#5]
[#1][#2#3][#2#3][#2#5][#2#4][#2#4][#2#4][#2#5]
[#1][#2#3][#2#3][#2#5][#2#4][#2#4][#2#4][#2#5][#2#3][#2#5]
}
Time: 0.000555 s.
Each line gives the action numbers for the string positions for the successive
strings z, zaac, zaacbbbc, zaacbbbcac with final string positions @0, @3, @7,
@9.
The last line gives for #1 the sub-string "z" , and for #2 "aacbbbcac". For #3
"a", and for #5 "c". But #4 is missing, indication there is no match. So there
might be problem here, as there are earlier matches:
Perhaps the intent is that it should be implemented as a loop, only retaining
the last #4, but the problem is that that is not what the underlying theory
says.
More information about the Gcc-bugs
mailing list