This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior


Hi Tim,

On 11/19/2014 08:27 AM, Tim Shen wrote:
On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini
<paolo.carlini@oracle.com> wrote:
Jonathan lately is following your work much better than me, but naively
seems weird that _M_begin is non-const and _M_end is const, a different type
anyway.
Hmm. The current regex_search algorithm is implemented as try match
starting from _M_begin; if it doesn't match, start over from
_M_begin+1, ...

As we can tell, _M_begin is never changed, and it's const.

The problem is when the executer reaches the accept state (which
indicates a match) we use _M_current == _M_begin to verify if it's an
empty match. It is possible that, when we are not in the first
iteration, say, in the second iteration actually, _M_current is
initialized with _M_begin+1. It turns out even _M_current has never
been increased (no chars are eaten, aka empty match), _M_current !=
_M_begin is still true.

This fix is making each regex_search iteration more thorough, with
increased _M_begin, as if it's a new regex _M_search_from_first.

I've carefully (admittedly, after sending this patch) inspect
everywhere when _M_begin is used. It turns out _M_begin is under
well-defined (the initial position of _M_current when current
iteration starts) invariants (see _Executor<>::_M_at_begin), indicated
by the use of regex_constants::match_prev_avail. This flag actually
implies that __begin iterator passed into regex_search is not always
"the physical boundary" who matches "^". Boost (and we) conforms this
behavior:

     std::regex_search("asdf", std::regex("^asdf"),
std::regex_constants::match_prev_avail)

returns false.

It's more elegant to move _Executor::_M_search out of its class and
make _M_begin still const, but _Executor costs too much to initialize.
Good. To be clear, not having carefully analyzed whatsoever, my point was more about changing _M_end too, to non-const, than about not touching _M_begin. Would that make sense?

Thanks,
Paolo.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]