C++14 standard (page 1107, see here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#1121), 28.5.2 [Bitmask type regex_constants::match_flag_type]: ... format_sed When a regular expression match is to be replaced by a new string, the new string shall be constructed using the rules used by the sed utility in POSIX. ... The rules which SED uses are documented in IEEE 1003.1 (p. 3221): An <ampersand> ('&') appearing in the replacement shall be replaced by the string matching the BRE. The special meaning of '&' in this context can be suppressed by preceding it by a <backslash>. The characters "\n", where n is a digit, shall be replaced by the text matched by the corresponding back-reference expression. ... The special meaning of "\n" where n is a digit in this context, can be suppressed by preceding it by a <backslash>. The current implementation of std::regex_replace does not comply to the standard: special meanings of &, \0, \2 cannot be suppressed by escaping them with backslashes. Reproducer: #include <regex> int frep(const wchar_t *istr, const wchar_t *rstr, const wchar_t *ostr) { std::basic_regex<wchar_t> wrgx(L"(a*)(b+)"); std::basic_string<wchar_t> wstr = istr, wret = ostr, test; std::regex_replace(std::back_inserter(test), wstr.begin(), wstr.end(), wrgx, std::basic_string<wchar_t>(rstr), std::regex_constants::format_sed); return !printf("'%ls' %c= '%ls'\n", test.c_str(), (test == wret)? '=' : '!', wret.c_str()); } int main() { frep(L"xbbyabz", L"!\\\\2!", L"x!\\2!y!\\2!z"); frep(L"xbbyabz", L"!\\\\0!", L"x!\\0!y!\\0!z"); return frep(L"xbbyabz", L"!\\&!", L"x!&!y!&!z"); }
Reduced: #include <regex> int main() { auto format = std::regex_constants::format_sed; auto out = regex_replace("ab", std::regex("(a)(b)"), R"(\\1\&\\2)", format); if (out != R"(\1&\2)") throw 1; } Tim, is there an easy fix for this that I can try, or should I leave it to you?
Author: timshen Date: Sun Jan 14 00:48:30 2018 New Revision: 256654 URL: https://gcc.gnu.org/viewcvs?rev=256654&root=gcc&view=rev Log: PR libstdc++/83601 * include/bits/regex.tcc (regex_replace): Fix escaping in sed. * testsuite/28_regex/algorithms/regex_replace/char/pr83601.cc: Tests. * testsuite/28_regex/algorithms/regex_replace/wchar_t/pr83601.cc: Tests. Added: trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_replace/char/pr83601.cc trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_replace/wchar_t/pr83601.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/regex.tcc
Mark as fixed.
If there is interest, another (smaller) test case would be: const std::string input = R"((.))"; const std::string expected = R"(\(\.\))"; const std::string obtained_std = std::regex_replace(input, std::regex(R"([.^$|()\[\]{}*+?\\])"), R"(\\&)", std::regex_constants::match_default | std::regex_constants::format_sed); const std::string obtained_boost = boost::regex_replace(input, boost::regex(R"([.^$|()\[\]{}*+?\\])"), R"(\\&)", boost::regex_constants::match_default | boost::regex_constants::format_sed); std::cout << "expected.......='" << expected << "'" << std::endl; std::cout << "obtained(std)..='" << obtained_std << "'" << std::endl; std::cout << "obtained(boost)='" << obtained_boost << "'" << std::endl; Output with GCC < 8: expected.......='\(\.\)' obtained(std)..='\\(\\(\\.\\)\\)' obtained(boost)='\(\(\.\)\)' With GCC >= 8, it works and it's the same as with boost.