This is the mail archive of the
mailing list for the libstdc++ project.
Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
- From: Paolo Carlini <paolo dot carlini at oracle dot com>
- To: Tim Shen <timshen at google dot com>
- Cc: libstdc++ <libstdc++ at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 19 Nov 2014 17:16:21 +0100
- Subject: Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
- Authentication-results: sourceware.org; auth=none
- References: <CAG4ZjNm2Dj3UbcP1L1fp7BTku2e2zDyDwU1SH4hYSktz5XhbJA at mail dot gmail dot com> <546B9BB3 dot 1010205 at oracle dot com> <CAG4ZjNmhS_55VhA-JX1k2UJM+5eEthpywq8C7qnFQvypvHP=ng at mail dot gmail dot com>
On 11/19/2014 08:27 AM, Tim Shen wrote:
Good. To be clear, not having carefully analyzed whatsoever, my point
was more about changing _M_end too, to non-const, than about not
touching _M_begin. Would that make sense?
On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini
Jonathan lately is following your work much better than me, but naively
seems weird that _M_begin is non-const and _M_end is const, a different type
Hmm. The current regex_search algorithm is implemented as try match
starting from _M_begin; if it doesn't match, start over from
As we can tell, _M_begin is never changed, and it's const.
The problem is when the executer reaches the accept state (which
indicates a match) we use _M_current == _M_begin to verify if it's an
empty match. It is possible that, when we are not in the first
iteration, say, in the second iteration actually, _M_current is
initialized with _M_begin+1. It turns out even _M_current has never
been increased (no chars are eaten, aka empty match), _M_current !=
_M_begin is still true.
This fix is making each regex_search iteration more thorough, with
increased _M_begin, as if it's a new regex _M_search_from_first.
I've carefully (admittedly, after sending this patch) inspect
everywhere when _M_begin is used. It turns out _M_begin is under
well-defined (the initial position of _M_current when current
iteration starts) invariants (see _Executor<>::_M_at_begin), indicated
by the use of regex_constants::match_prev_avail. This flag actually
implies that __begin iterator passed into regex_search is not always
"the physical boundary" who matches "^". Boost (and we) conforms this
It's more elegant to move _Executor::_M_search out of its class and
make _M_begin still const, but _Executor costs too much to initialize.