Bug 71044 - Optimize std::filesystem implementation
Summary: Optimize std::filesystem implementation
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: 9.0
Assignee: Jonathan Wakely
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-10 11:39 UTC by Jonathan Wakely
Modified: 2019-05-03 12:03 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-05-10 00:00:00


Attachments
Make directory iterators lazy (1.27 KB, patch)
2016-05-10 11:42 UTC, Jonathan Wakely
Details | Diff
Add assertions to directory iterator functions (917 bytes, patch)
2016-05-10 15:43 UTC, Jonathan Wakely
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Wakely 2016-05-10 11:39:00 UTC
Before the code is moved from std::experimental::filesystem to std::filesystem (and into the shared library) there are a few improvements that should be made.

http://wg21.link/lwg2674 relaxes requirements on path::iterator so that a path is no longer required to embed a container of paths. The components could be stored as string_view objects, and only converted to path objects as needed (generally that means when a path::iterator is dereferenced).

Parsing and splitting a path into components could be done lazily, so that it isn't done for intermediate temporary copies of a path, only the final value that is manipulated. Move semantics might mean this is unnecessary, since performance-sensitive code can usually move values instead of copying them.

Directory iterators currently construct a path object on every increment. This means that incrementing a directory iterator five times then dereferencing it constructs five paths which are never used (and each one is eagerly split into components, allocating a container of paths!). Construction of the path could be postponed so that it's only done on dereference, so that the five increments are cheap. This has thread-safety consequences, as the dereference is a const operation and could be performed concurrently, so the lazy construction of the path needs to be synchronized.
Comment 1 Jonathan Wakely 2016-05-10 11:42:43 UTC
Created attachment 38461 [details]
Make directory iterators lazy

(In reply to Jonathan Wakely from comment #0)
> Directory iterators currently construct a path object on every increment.
> This means that incrementing a directory iterator five times then
> dereferencing it constructs five paths which are never used (and each one is
> eagerly split into components, allocating a container of paths!).
> Construction of the path could be postponed so that it's only done on
> dereference, so that the five increments are cheap. This has thread-safety
> consequences, as the dereference is a const operation and could be performed
> concurrently, so the lazy construction of the path needs to be synchronized.

As in the attached patch.
Comment 2 Jonathan Wakely 2016-05-10 15:43:24 UTC
Created attachment 38463 [details]
Add assertions to directory iterator functions

Also consider replacing non-inline functions such as directory_iterator::increment() with inline ones with assertions.

This would allow those functions to check preconditions at runtime, and more functions could be noexcept.
Comment 3 Jonathan Wakely 2016-10-28 16:54:07 UTC
Several member functions of path construct a path from their argument, but could be done without any allocations, e.g. path::compare(string_view)
Comment 4 Jonathan Wakely 2018-12-12 16:14:25 UTC
Author: redi
Date: Wed Dec 12 16:13:49 2018
New Revision: 267057

URL: https://gcc.gnu.org/viewcvs?rev=267057&root=gcc&view=rev
Log:
Overload std::distance and std::advance for path::iterator

Although filesystem::path::iterator is only a bidirectional iterator,
the underlying sequence has random access iterators (specifically, raw
pointers). This means std::distance and std::advance can be implemented
more efficiently than the generic versions which apply ++ and --
repeatedly.

	PR libstdc++/71044 (partial)
	* include/bits/fs_path.h (__path_iter_distance, __path_iter_advance):
	New friend functions to implement std::distance and std::advance more
	efficiently.
	(distance, advance): Add overloads for path::iterator.
	* testsuite/27_io/filesystem/path/itr/components.cc: Test new
	overload.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/fs_path.h
    trunk/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
Comment 5 Jonathan Wakely 2018-12-13 20:34:27 UTC
Author: redi
Date: Thu Dec 13 20:33:55 2018
New Revision: 267106

URL: https://gcc.gnu.org/viewcvs?rev=267106&root=gcc&view=rev
Log:
PR libstdc++/71044 optimize std::filesystem::path construction

This new implementation has a smaller footprint than the previous
implementation, due to replacing std::vector<_Cmpt> with a custom pimpl
type that only needs a single pointer. The _M_type enumeration is also
combined with the pimpl type, by using a tagged pointer, reducing
sizeof(path) further still.

Construction and modification of paths is now done more efficiently, by
splitting the input into a stack-based buffer of string_view objects
instead of a dynamically-allocated vector containing strings. Once the
final size is known only a single allocation is needed to reserve space
for it.  The append and concat operations no longer require constructing
temporary path objects, nor re-parsing the entire native pathname.

This results in algorithmic improvements to path construction, and
working with large paths is much faster.

	PR libstdc++/71044
	* include/bits/fs_path.h (path::path(path&&)): Add noexcept when
	appropriate. Move _M_cmpts instead of reparsing the native pathname.
	(path::operator=(const path&)): Do not define as defaulted.
	(path::operator/=, path::append): Call _M_append.
	(path::concat): Call _M_concat.
	(path::path(string_type, _Type): Change type of first parameter to
	basic_string_view<value_type>.
	(path::_M_append(basic_string_view<value_type>)): New member function.
	(path::_M_concat(basic_string_view<value_type>)): New member function.
	(_S_convert(value_type*, __null_terminated)): Return string view.
	(_S_convert(const value_type*, __null_terminated)): Return string view.
	(_S_convert(value_type*, value_type*))
	(_S_convert(const value_type*, const value_type*)): Add overloads for
	pairs of pointers.
	(_S_convert(_InputIterator, __null_terminated)): Construct string_type
	explicitly, for cases where _S_convert returns a string view.
	(path::_S_is_dir_sep): Replace with non-member is_dir_sep.
	(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
	(path::_M_add_filename): Remove.
	(path::_M_type()): New member function to replace _M_type data member.
	(path::_List): Define new struct type instead of using std::vector.
	(path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of
	first parameter to basic_string_view<value_type>.
	(path::operator+=(const path&)): Do not define inline.
	(path::operator+=(const string_type&)): Call _M_concat.
	(path::operator+=(const value_type*)): Likewise.
	(path::operator+=(value_type)): Likewise.
	(path::operator+=(basic_string_view<value_type>)): Likewise.
	(path::operator/=(const path&)): Do not define inline.
	(path::_M_append(path)): Remove.
	* python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that
	understands the new path::_List type.
	* src/filesystem/std-path.cc (is_dir_sep): New function to replace
	path::_S_is_dir_sep.
	(path::_Parser): New helper class to parse strings as paths.
	(path::_List::_Impl): Define container type for path components.
	(path::_List): Define members.
	(path::operator=(const path&)): Define explicitly, to provide the
	strong exception safety guarantee.
	(path::operator/=(const path&)): Implement manually by processing
	each component of the argument, rather than using _M_split_cmpts
	to parse the entire string again.
	(path::_M_append(string_type)): Likewise.
	(path::operator+=(const path&)): Likewise.
	(path::_M_concat(string_type)): Likewise.
	(path::remove_filename()): Perform trim directly instead of calling
	_M_trim().
	(path::_M_split_cmpts()): Rewrite in terms of _Parser class.
	(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
	(path::_M_add_filename): Remove.
	* testsuite/27_io/filesystem/path/append/source.cc: Test appending a
	string view that aliases the path.
	testsuite/27_io/filesystem/path/concat/strings.cc: Test concatenating
	a string view that aliases the path.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/fs_path.h
    trunk/libstdc++-v3/python/libstdcxx/v6/printers.py
    trunk/libstdc++-v3/src/filesystem/std-path.cc
    trunk/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
    trunk/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
Comment 6 Jonathan Wakely 2018-12-13 20:41:52 UTC
(In reply to Jonathan Wakely from comment #0)
> Before the code is moved from std::experimental::filesystem to
> std::filesystem (and into the shared library) there are a few improvements
> that should be made.
> 
> http://wg21.link/lwg2674 relaxes requirements on path::iterator so that a
> path is no longer required to embed a container of paths. The components
> could be stored as string_view objects, and only converted to path objects
> as needed (generally that means when a path::iterator is dereferenced).

I've decided against this. The current implementation avoid the need for synchronization to make sure lazy init doesn't have data races, and avoids lifetime problems due to paths being stashed in the iterators e.g.

  filesystem::path p = "foo/bar";
  basic_string_view s = std::next(p->begin())->native();

This creates a dangling string view if the path "bar" and its native string only have the lifetime of the iterator.

> Parsing and splitting a path into components could be done lazily, so that
> it isn't done for intermediate temporary copies of a path, only the final
> value that is manipulated. Move semantics might mean this is unnecessary,
> since performance-sensitive code can usually move values instead of copying
> them.

Parsing and copying paths is now much faster (but still performs a lot of allocations).

> Directory iterators currently construct a path object on every increment.
> This means that incrementing a directory iterator five times then
> dereferencing it constructs five paths which are never used (and each one is
> eagerly split into components, allocating a container of paths!).
> Construction of the path could be postponed so that it's only done on
> dereference, so that the five increments are cheap. This has thread-safety
> consequences, as the dereference is a const operation and could be performed
> concurrently, so the lazy construction of the path needs to be synchronized.

This has not been done.

(In reply to Jonathan Wakely from comment #2)
> Created attachment 38463 [details]
> Add assertions to directory iterator functions
> 
> Also consider replacing non-inline functions such as
> directory_iterator::increment() with inline ones with assertions.

Maybe not necessary. Building with --enable-libstdcxx-debug and then linking to $libdir/debug/libstdc++s.a gives a version of the library with assertions enabled.

> This would allow those functions to check preconditions at runtime, and more
> functions could be noexcept.

(In reply to Jonathan Wakely from comment #3)
> Several member functions of path construct a path from their argument, but
> could be done without any allocations, e.g. path::compare(string_view)

This has not been done, but would be trivial with the new path::_Parser helper type, which splits a string into components with no allocations, so that each component could be compared to [this->begin(),this->end()).
Comment 7 Jonathan Wakely 2018-12-17 22:44:05 UTC
Author: redi
Date: Mon Dec 17 22:43:31 2018
New Revision: 267222

URL: https://gcc.gnu.org/viewcvs?rev=267222&root=gcc&view=rev
Log:
PR libstdc++/71044 fix off-by-one errors introduced recently

The recent changes to append/concat directly from strings (without
constructing paths) introduced regressions where one of the components
could be omitted from the iteration sequence in the result.

	PR libstdc++/71044
	* src/filesystem/std-path.cc (path::_M_append): Fix off-by-one error
	that caused a component to be lost from the iteration sequence.
	(path::_M_concat): Likewise.
	* testsuite/27_io/filesystem/path/append/source.cc: Test appending
	long strings.
	* testsuite/27_io/filesystem/path/concat/strings.cc: Test
	concatenating long strings.
	* testsuite/27_io/filesystem/path/construct/string_view.cc: Test
	construction from long string.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/src/filesystem/std-path.cc
    trunk/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
    trunk/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
    trunk/libstdc++-v3/testsuite/27_io/filesystem/path/construct/string_view.cc
Comment 8 Jonathan Wakely 2018-12-18 15:53:33 UTC
(In reply to Jonathan Wakely from comment #3)
> Several member functions of path construct a path from their argument, but
> could be done without any allocations, e.g. path::compare(string_view)

This is done in r267235.
Comment 9 Jonathan Wakely 2019-02-09 00:26:12 UTC
Author: redi
Date: Sat Feb  9 00:25:39 2019
New Revision: 268713

URL: https://gcc.gnu.org/viewcvs?rev=268713&root=gcc&view=rev
Log:
Add noexcept to filesystem::path query functions

In the standard these member functions are specified in terms of the
potentially-throwing path decompositions functions, but we implement
them without constructing any new paths or doing anything else that can
throw.

	PR libstdc++/71044
	* include/bits/fs_path.h (path::has_root_name)
	(path::has_root_directory, path::has_root_path)
	(path::has_relative_path, path::has_parent_path)
	(path::has_filename, path::has_stem, path::has_extension)
	(path::is_absolute, path::is_relative, path::_M_find_extension): Add
	noexcept.
	* src/c++17/fs_path.cc (path::has_root_name)
	(path::has_root_directory, path::has_root_path)
	(path::has_relative_path, path::has_parent_path)
	(path::has_filename, path::_M_find_extension): Add noexcept.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/fs_path.h
    trunk/libstdc++-v3/src/c++17/fs_path.cc
Comment 10 Jakub Jelinek 2019-05-03 09:17:13 UTC
GCC 9.1 has been released.
Comment 11 Jonathan Wakely 2019-05-03 12:03:54 UTC
I think we're all done here.