This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Changes to std::match_results and <regex>


Our match_results has some problems:

size() returns N+1 where N is the number of elements, which already
includes the 0th sub-match (i.e. the entire match) so the returned
value is off-by-one.  It should be n+1 where n is the number of marked
sub-expressions.

operator[] does not check for sub >= size() so allows out-of-range accesses.

position() can result in undefined behaviour if called with an
out-of-range argument because it will try to compare a potentially
singular iterator.  This is really a problem with the draft standard,
which doesn't make it clear that a match_results can be singular and
there are implied preconditions on some member functions. I plan to
file an NB comment about this.

_M_prefix and _M_suffix could be stored in the vector, reducing the
size of an unmatched match_results.  This also makes _M_matched
unnecessary because !_M_matched is implied by the vector being empty.
This reduces the size of match_results by 4 pointers and 3 bools and
simplifies copying, moving and swapping because the only state is in
the vector.

Finally, it's missing move operations and other changes since TR1.

The patch below addresses all these issues, so I would like to check
it in.  It changes the ABI of match_results, but as that class is
completely non-functional it can't break any programs unless they
explicitly make use of sizeof(match_results) but don't actually
instantiate match_results.

My suggested implementation is documented in the code:

      The vector base is empty if this does not represent a successful match.
      Otherwise it contains n+3 elements where n is the number of marked
      sub-expressions:
      [0]   entire match
      [1]   1st marked subexpression
      ...
      [n]   nth marked subexpression
      [n+1] prefix
      [n+2] suffix

This means size() == n+1 == (N ? N-2 : 0)
Copying, moving and swapping, as well as empty(), are now simply
forwarded to the vector base.

I made position() return -1 for out-of-range, consistent with
boost::regex (and with e.g. string::find)

operator[] checks its argument and returns a static object
representing an unmatched subexpression (as required).

Does anyone have any objections or improvements to this change?

Attachment: regex.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]