Before the code is moved from std::experimental::filesystem to std::filesystem (and into the shared library) there are a few improvements that should be made. http://wg21.link/lwg2674 relaxes requirements on path::iterator so that a path is no longer required to embed a container of paths. The components could be stored as string_view objects, and only converted to path objects as needed (generally that means when a path::iterator is dereferenced). Parsing and splitting a path into components could be done lazily, so that it isn't done for intermediate temporary copies of a path, only the final value that is manipulated. Move semantics might mean this is unnecessary, since performance-sensitive code can usually move values instead of copying them. Directory iterators currently construct a path object on every increment. This means that incrementing a directory iterator five times then dereferencing it constructs five paths which are never used (and each one is eagerly split into components, allocating a container of paths!). Construction of the path could be postponed so that it's only done on dereference, so that the five increments are cheap. This has thread-safety consequences, as the dereference is a const operation and could be performed concurrently, so the lazy construction of the path needs to be synchronized.
Created attachment 38461 [details] Make directory iterators lazy (In reply to Jonathan Wakely from comment #0) > Directory iterators currently construct a path object on every increment. > This means that incrementing a directory iterator five times then > dereferencing it constructs five paths which are never used (and each one is > eagerly split into components, allocating a container of paths!). > Construction of the path could be postponed so that it's only done on > dereference, so that the five increments are cheap. This has thread-safety > consequences, as the dereference is a const operation and could be performed > concurrently, so the lazy construction of the path needs to be synchronized. As in the attached patch.
Created attachment 38463 [details] Add assertions to directory iterator functions Also consider replacing non-inline functions such as directory_iterator::increment() with inline ones with assertions. This would allow those functions to check preconditions at runtime, and more functions could be noexcept.
Several member functions of path construct a path from their argument, but could be done without any allocations, e.g. path::compare(string_view)
Author: redi Date: Wed Dec 12 16:13:49 2018 New Revision: 267057 URL: https://gcc.gnu.org/viewcvs?rev=267057&root=gcc&view=rev Log: Overload std::distance and std::advance for path::iterator Although filesystem::path::iterator is only a bidirectional iterator, the underlying sequence has random access iterators (specifically, raw pointers). This means std::distance and std::advance can be implemented more efficiently than the generic versions which apply ++ and -- repeatedly. PR libstdc++/71044 (partial) * include/bits/fs_path.h (__path_iter_distance, __path_iter_advance): New friend functions to implement std::distance and std::advance more efficiently. (distance, advance): Add overloads for path::iterator. * testsuite/27_io/filesystem/path/itr/components.cc: Test new overload. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/fs_path.h trunk/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
Author: redi Date: Thu Dec 13 20:33:55 2018 New Revision: 267106 URL: https://gcc.gnu.org/viewcvs?rev=267106&root=gcc&view=rev Log: PR libstdc++/71044 optimize std::filesystem::path construction This new implementation has a smaller footprint than the previous implementation, due to replacing std::vector<_Cmpt> with a custom pimpl type that only needs a single pointer. The _M_type enumeration is also combined with the pimpl type, by using a tagged pointer, reducing sizeof(path) further still. Construction and modification of paths is now done more efficiently, by splitting the input into a stack-based buffer of string_view objects instead of a dynamically-allocated vector containing strings. Once the final size is known only a single allocation is needed to reserve space for it. The append and concat operations no longer require constructing temporary path objects, nor re-parsing the entire native pathname. This results in algorithmic improvements to path construction, and working with large paths is much faster. PR libstdc++/71044 * include/bits/fs_path.h (path::path(path&&)): Add noexcept when appropriate. Move _M_cmpts instead of reparsing the native pathname. (path::operator=(const path&)): Do not define as defaulted. (path::operator/=, path::append): Call _M_append. (path::concat): Call _M_concat. (path::path(string_type, _Type): Change type of first parameter to basic_string_view<value_type>. (path::_M_append(basic_string_view<value_type>)): New member function. (path::_M_concat(basic_string_view<value_type>)): New member function. (_S_convert(value_type*, __null_terminated)): Return string view. (_S_convert(const value_type*, __null_terminated)): Return string view. (_S_convert(value_type*, value_type*)) (_S_convert(const value_type*, const value_type*)): Add overloads for pairs of pointers. (_S_convert(_InputIterator, __null_terminated)): Construct string_type explicitly, for cases where _S_convert returns a string view. (path::_S_is_dir_sep): Replace with non-member is_dir_sep. (path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir) (path::_M_add_filename): Remove. (path::_M_type()): New member function to replace _M_type data member. (path::_List): Define new struct type instead of using std::vector. (path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of first parameter to basic_string_view<value_type>. (path::operator+=(const path&)): Do not define inline. (path::operator+=(const string_type&)): Call _M_concat. (path::operator+=(const value_type*)): Likewise. (path::operator+=(value_type)): Likewise. (path::operator+=(basic_string_view<value_type>)): Likewise. (path::operator/=(const path&)): Do not define inline. (path::_M_append(path)): Remove. * python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that understands the new path::_List type. * src/filesystem/std-path.cc (is_dir_sep): New function to replace path::_S_is_dir_sep. (path::_Parser): New helper class to parse strings as paths. (path::_List::_Impl): Define container type for path components. (path::_List): Define members. (path::operator=(const path&)): Define explicitly, to provide the strong exception safety guarantee. (path::operator/=(const path&)): Implement manually by processing each component of the argument, rather than using _M_split_cmpts to parse the entire string again. (path::_M_append(string_type)): Likewise. (path::operator+=(const path&)): Likewise. (path::_M_concat(string_type)): Likewise. (path::remove_filename()): Perform trim directly instead of calling _M_trim(). (path::_M_split_cmpts()): Rewrite in terms of _Parser class. (path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir) (path::_M_add_filename): Remove. * testsuite/27_io/filesystem/path/append/source.cc: Test appending a string view that aliases the path. testsuite/27_io/filesystem/path/concat/strings.cc: Test concatenating a string view that aliases the path. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/fs_path.h trunk/libstdc++-v3/python/libstdcxx/v6/printers.py trunk/libstdc++-v3/src/filesystem/std-path.cc trunk/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc trunk/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
(In reply to Jonathan Wakely from comment #0) > Before the code is moved from std::experimental::filesystem to > std::filesystem (and into the shared library) there are a few improvements > that should be made. > > http://wg21.link/lwg2674 relaxes requirements on path::iterator so that a > path is no longer required to embed a container of paths. The components > could be stored as string_view objects, and only converted to path objects > as needed (generally that means when a path::iterator is dereferenced). I've decided against this. The current implementation avoid the need for synchronization to make sure lazy init doesn't have data races, and avoids lifetime problems due to paths being stashed in the iterators e.g. filesystem::path p = "foo/bar"; basic_string_view s = std::next(p->begin())->native(); This creates a dangling string view if the path "bar" and its native string only have the lifetime of the iterator. > Parsing and splitting a path into components could be done lazily, so that > it isn't done for intermediate temporary copies of a path, only the final > value that is manipulated. Move semantics might mean this is unnecessary, > since performance-sensitive code can usually move values instead of copying > them. Parsing and copying paths is now much faster (but still performs a lot of allocations). > Directory iterators currently construct a path object on every increment. > This means that incrementing a directory iterator five times then > dereferencing it constructs five paths which are never used (and each one is > eagerly split into components, allocating a container of paths!). > Construction of the path could be postponed so that it's only done on > dereference, so that the five increments are cheap. This has thread-safety > consequences, as the dereference is a const operation and could be performed > concurrently, so the lazy construction of the path needs to be synchronized. This has not been done. (In reply to Jonathan Wakely from comment #2) > Created attachment 38463 [details] > Add assertions to directory iterator functions > > Also consider replacing non-inline functions such as > directory_iterator::increment() with inline ones with assertions. Maybe not necessary. Building with --enable-libstdcxx-debug and then linking to $libdir/debug/libstdc++s.a gives a version of the library with assertions enabled. > This would allow those functions to check preconditions at runtime, and more > functions could be noexcept. (In reply to Jonathan Wakely from comment #3) > Several member functions of path construct a path from their argument, but > could be done without any allocations, e.g. path::compare(string_view) This has not been done, but would be trivial with the new path::_Parser helper type, which splits a string into components with no allocations, so that each component could be compared to [this->begin(),this->end()).
Author: redi Date: Mon Dec 17 22:43:31 2018 New Revision: 267222 URL: https://gcc.gnu.org/viewcvs?rev=267222&root=gcc&view=rev Log: PR libstdc++/71044 fix off-by-one errors introduced recently The recent changes to append/concat directly from strings (without constructing paths) introduced regressions where one of the components could be omitted from the iteration sequence in the result. PR libstdc++/71044 * src/filesystem/std-path.cc (path::_M_append): Fix off-by-one error that caused a component to be lost from the iteration sequence. (path::_M_concat): Likewise. * testsuite/27_io/filesystem/path/append/source.cc: Test appending long strings. * testsuite/27_io/filesystem/path/concat/strings.cc: Test concatenating long strings. * testsuite/27_io/filesystem/path/construct/string_view.cc: Test construction from long string. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/src/filesystem/std-path.cc trunk/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc trunk/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc trunk/libstdc++-v3/testsuite/27_io/filesystem/path/construct/string_view.cc
(In reply to Jonathan Wakely from comment #3) > Several member functions of path construct a path from their argument, but > could be done without any allocations, e.g. path::compare(string_view) This is done in r267235.
Author: redi Date: Sat Feb 9 00:25:39 2019 New Revision: 268713 URL: https://gcc.gnu.org/viewcvs?rev=268713&root=gcc&view=rev Log: Add noexcept to filesystem::path query functions In the standard these member functions are specified in terms of the potentially-throwing path decompositions functions, but we implement them without constructing any new paths or doing anything else that can throw. PR libstdc++/71044 * include/bits/fs_path.h (path::has_root_name) (path::has_root_directory, path::has_root_path) (path::has_relative_path, path::has_parent_path) (path::has_filename, path::has_stem, path::has_extension) (path::is_absolute, path::is_relative, path::_M_find_extension): Add noexcept. * src/c++17/fs_path.cc (path::has_root_name) (path::has_root_directory, path::has_root_path) (path::has_relative_path, path::has_parent_path) (path::has_filename, path::_M_find_extension): Add noexcept. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/fs_path.h trunk/libstdc++-v3/src/c++17/fs_path.cc
GCC 9.1 has been released.
I think we're all done here.