This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Regex refactoring
- From: Jonathan Wakely <jwakely dot gcc at gmail dot com>
- To: Tim Shen <timshen91 at gmail dot com>
- Cc: "libstdc++" <libstdc++ at gcc dot gnu dot org>
- Date: Fri, 8 Nov 2013 18:47:53 +0000
- Subject: Re: Regex refactoring
- Authentication-results: sourceware.org; auth=none
- References: <CAH6eHdQJwYAF5Yy6k3AGY_D4+fifjF5g57iZUAOeQ4C4wiazWA at mail dot gmail dot com> <CAPrifD=MTgD2rhjcWiibONcMLk9otpH6Q6vELh2siR4iovyxCQ at mail dot gmail dot com> <CAH6eHdSNdajgLtNK0NHuhWKTt2ND89xaOrR_knjfn_GymdrU3Q at mail dot gmail dot com>
On 7 November 2013 19:51, Jonathan Wakely wrote:
> On 7 November 2013 18:15, Tim Shen wrote:
>> On Thu, Nov 7, 2013 at 11:49 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>> The most common instantiations of _Compiler will probably use const
>>> char*, std::string::iterator and std::string::const_iterator. It would
>>> be good if they could all share code, since they all operate on the
>>> same underlying character type, but I don't know if that's possible.
>>
>> We may read whole input characters into a basic_string<_CharT> then
>> use it to initialize _Compiler. Now it can be simplified to:
>>
>> template<typename _TraitsT>
>> class _Compiler
>> {
>> typedef typename _TraitsT::char_type _CharT;
>> ...
>> _Compiler(basic_string<_CharT>&& __input_string);
>> };
>>
>> It requires more memory, but I don't think storing a string which
>> describing a regex is unacceptable.
If I'd followed your suggestion that would also have fixed the fact
this doesn't compile:
#include <regex>
#include <iterator>
int main()
{
unsigned char s[1] = { '.' };
std::regex re(std::begin(s), std::end(s));
}
_Compiler<It, Tr>::_M_match_token() assumes that _Compiler<It,
Tr>::_StringT (aka Tr::string_type) is the same as
_Scanner<It>::_StringT (aka
basic_string<iterator_traits<It>::value_type>) which isn't always
true.
Curiously, this does compile:
unsigned char s[1] = { '.' };
std::regex re;
re.assign(std::begin(s), std::end(s));
That works because the range form of basic_regex::assign() constructs
its own string_type from the range, then scans that (exactly as
specified in the standard). The constructor fails because it tries to
scan the input range directly.
I think the simplest fix is this:
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -411,7 +411,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
if (token == _M_scanner._M_get_token())
{
- _M_value = _M_scanner._M_get_value();
+ const auto& __value = _M_scanner._M_get_value();
+ _M_value.assign(__value.begin(), __value.end());
_M_scanner._M_advance();
return true;
}