This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: w.r.t. [Bug libstdc++/85472] Regex match bug
> On 3 May 2018, at 19:34, Tim Shen <timshen91@gmail.com> wrote:
>
> On Thu, May 3, 2018 at 2:48 AM, Hans Åberg <haberg-1@telia.com> wrote:
>> The DFA algorithm departs from one in the book by Aho, Sethi & Ullman, "Compilers", sec. 3.9, subsection "From a Regular Expression to a DFA", only that I start with a NFA without empty transitions. One moves over sets of NFA states p, and there is a function q = follow(p, c) that computes the next sets of NFA states q, by simply tracing all the c transitions in all the states of p. If there are empty transitions, those must be traced, so not having them simplifies the computations.
>>
>> Now, when follow(p, c) computes q, the reverse NFA information that I use is available: It records all the c transitions found p_i -> q_j, but in reverse form q_j -> p_i, as sets q_j -> p' where p'. For use with a DFA, I record the data for a lookup on the (p, c) pair.
>
> I see. FWIW this is equivalent to the "BFS" approach I described in the talk.
It what time? - It is a long talk.
>> The DFA lexes forward until a final states is encountered; merely record the DFA states. Then starting with this final state and the last matched character, I work the matched string backwards on the reverse NFA data for each DFA state used to compute the sets of states involved.
>
> I thought about this approach as well. The reason it doesn't work is
> back-references [1]. For example, "(a+)\1" matches any even number of
> 'a's.
Do you have a better description on how it is supposed to work. Specifically, how do you get only an even number in this case?
> Implementing back-reference matching requires the matches to be
> available during the match, not after the match.
Please illuminate this.
> [1] http://en.cppreference.com/w/cpp/regex/ecmascript#Backreferences