This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: w.r.t. [Bug libstdc++/85472] Regex match bug


> On 3 May 2018, at 19:34, Tim Shen <timshen91@gmail.com> wrote:
> 
> On Thu, May 3, 2018 at 2:48 AM, Hans Åberg <haberg-1@telia.com> wrote:
>> The DFA algorithm departs from one in the book by Aho, Sethi & Ullman, "Compilers", sec. 3.9, subsection "From a Regular Expression to a DFA", only that I start with a NFA without empty transitions. One moves over sets of NFA states p, and there is a function q = follow(p, c) that computes the next sets of NFA states q, by simply tracing all the c transitions in all the states of p. If there are empty transitions, those must be traced, so not having them simplifies the computations.
>> 
>> Now, when follow(p, c) computes q, the reverse NFA information that I use is available: It records all the c transitions  found p_i -> q_j, but in reverse form q_j -> p_i, as sets q_j -> p' where p'. For use with a DFA, I record the data for a lookup on the (p, c) pair.
> 
> I see. FWIW this is equivalent to the "BFS" approach I described in the talk.

It what time? - It is a long talk.

>> The DFA lexes forward until a final states is encountered; merely record the DFA states. Then starting with this final state and the last matched character, I work the matched string backwards on the reverse NFA data for each DFA state used to compute the sets of states involved.
> 
> I thought about this approach as well. The reason it doesn't work is
> back-references [1]. For example, "(a+)\1" matches any even number of
> 'a's.

Do you have a better description on how it is supposed to work. Specifically, how do you get only an even number in this case?

> Implementing back-reference matching requires the matches to be
> available during the match, not after the match.

Please illuminate this.

> [1] http://en.cppreference.com/w/cpp/regex/ecmascript#Backreferences



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]