This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Regex refactoring


On 7 November 2013 19:51, Jonathan Wakely wrote:
> On 7 November 2013 18:15, Tim Shen wrote:
>> On Thu, Nov 7, 2013 at 11:49 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>> The most common instantiations of _Compiler will probably use const
>>> char*, std::string::iterator and std::string::const_iterator. It would
>>> be good if they could all share code, since they all operate on the
>>> same underlying character type, but I don't know if that's possible.
>>
>> We may read whole input characters into a basic_string<_CharT> then
>> use it to initialize _Compiler. Now it can be simplified to:
>>
>> template<typename _TraitsT>
>>   class _Compiler
>>   {
>>     typedef typename _TraitsT::char_type _CharT;
>>     ...
>>     _Compiler(basic_string<_CharT>&& __input_string);
>>   };
>>
>> It requires more memory, but I don't think storing a string which
>> describing a regex is unacceptable.

If I'd followed your suggestion that would also have fixed the fact
this doesn't compile:

#include <regex>
#include <iterator>

int main()
{
  unsigned char s[1] = { '.' };
  std::regex re(std::begin(s), std::end(s));
}

_Compiler<It, Tr>::_M_match_token() assumes that _Compiler<It,
Tr>::_StringT (aka Tr::string_type) is the same as
_Scanner<It>::_StringT (aka
basic_string<iterator_traits<It>::value_type>) which isn't always
true.

Curiously, this does compile:

  unsigned char s[1] = { '.' };
  std::regex re;
  re.assign(std::begin(s), std::end(s));

That works because the range form of basic_regex::assign() constructs
its own string_type from the range, then scans that (exactly as
specified in the standard).  The constructor fails because it tries to
scan the input range directly.

I think the simplest fix is this:

--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -411,7 +411,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       if (token == _M_scanner._M_get_token())
        {
-         _M_value = _M_scanner._M_get_value();
+         const auto& __value = _M_scanner._M_get_value();
+         _M_value.assign(__value.begin(), __value.end());
          _M_scanner._M_advance();
          return true;
        }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]