Bug 91947 - std::filesystem::file_size will return wrong value on 32bit platforms with large files support
Summary: std::filesystem::file_size will return wrong value on 32bit platforms with la...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 8.3.0
: P3 normal
Target Milestone: 9.3
Assignee: Jonathan Wakely
URL:
Keywords:
Depends on: 81091
Blocks:
  Show dependency treegraph
 
Reported: 2019-10-01 09:46 UTC by Fregl
Modified: 2020-02-26 13:10 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-10-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fregl 2019-10-01 09:46:58 UTC
std::filesystem::file_size will return wrong value on 32bit platforms with large files support

/trunk/libstdc++-v3/src/filesystem/ops.cc
/trunk/libstdc++-v3/src/filesystem/std-ops.cc (the same problem)

struct S field `size` of type `size_t`.

When it casted from uintmax_t to size_t on 32bit platform, the result of this function will be wrong.

fs::file_size(const path& p, error_code& ec) noexcept
{
  struct S
  {
    S(const stat_type& st) : type(make_file_type(st)), size(st.st_size) { }
  }
    S() : type(file_type::not_found) { }
    file_type type;
    size_t size;
...
};

stat.off_t type is off64_t when large file support is enabled.
Comment 1 Jonathan Wakely 2019-10-01 15:56:12 UTC
Huh, I thought I'd already fixed this a while ago.
Comment 2 Jonathan Wakely 2019-10-02 10:26:50 UTC
Since libstdc++ is not built with large file support, stat::off_t is only a 32-bit type, and the operation fails before any truncation to size_t would happen:

terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
  what():  filesystem error: cannot get file size: Value too large for defined data type [/mnt/scratch/jwakely/tmp/large_file_for_testing_std_filesystem]
Aborted (core dumped)
Comment 3 Fregl 2019-10-02 11:40:30 UTC
In out product we use 32 bit toolchain, but work with large files.
So there is only solution to use direct stat call insted fs::file_size?
It seems this is firm limitation.
Comment 4 Jonathan Wakely 2019-10-02 11:47:15 UTC
(In reply to Fregl from comment #3)
> It seems this is firm limitation.

It's a bug, you just have to wait for it to be fixed.
Comment 5 Jonathan Wakely 2019-10-02 11:48:15 UTC
(In reply to Jonathan Wakely from comment #1)
> Huh, I thought I'd already fixed this a while ago.

I was thinking of Bug 85632 which is different.
Comment 6 Jonathan Wakely 2019-10-04 15:08:54 UTC
Author: redi
Date: Fri Oct  4 15:08:23 2019
New Revision: 276585

URL: https://gcc.gnu.org/viewcvs?rev=276585&root=gcc&view=rev
Log:
Build filesystem library with large file support

Enable AC_SYS_LARGEFILE to set the macros needed for large file APIs to
be used by default. We do not want to define those macros in the
public headers that users include. The values of the macros are copied
to a separate file that is only included by the filesystem sources
during the build, and then the macros in <bits/c++config.h> are renamed
so that they don't have any effect in user code including our headers.

Also use larger type for result of filesystem::file_size to avoid
truncation of large values on 32-bit systems (PR 91947).

	PR libstdc++/81091
	PR libstdc++/91947
	* configure.ac: Use AC_SYS_LARGEFILE to enable 64-bit file APIs.
	* config.h.in: Regenerate:
	* configure: Regenerate:
	* include/Makefile.am (${host_builddir}/largefile-config.h): New
	target to generate config header for filesystem library.
	(${host_builddir}/c++config.h): Rename macros for large file support.
	* include/Makefile.in: Regenerate.
	* src/c++17/fs_dir.cc: Include new config header.
	* src/c++17/fs_ops.cc: Likewise.
	(filesystem::file_size): Use uintmax_t for size.
	* src/filesystem/dir.cc: Include new config header.
	* src/filesystem/ops.cc: Likewise.
	(experimental::filesystem::file_size): Use uintmax_t for size.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/config.h.in
    trunk/libstdc++-v3/configure
    trunk/libstdc++-v3/configure.ac
    trunk/libstdc++-v3/include/Makefile.am
    trunk/libstdc++-v3/include/Makefile.in
    trunk/libstdc++-v3/src/c++17/fs_dir.cc
    trunk/libstdc++-v3/src/c++17/fs_ops.cc
    trunk/libstdc++-v3/src/filesystem/dir.cc
    trunk/libstdc++-v3/src/filesystem/ops.cc
Comment 7 Jonathan Wakely 2019-10-04 15:10:05 UTC
Fixed on trunk.
Comment 8 GCC Commits 2020-01-13 19:53:42 UTC
The releases/gcc-9 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:6cb662745d38e680a1a46fa04b108734cbc3df58

commit r9-8130-g6cb662745d38e680a1a46fa04b108734cbc3df58
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Jan 9 13:38:43 2020 +0000

    Build filesystem library with large file support
    
    Enable AC_SYS_LARGEFILE to set the macros needed for large file APIs to
    be used by default. We do not want to define those macros in the
    public headers that users include. The values of the macros are copied
    to a separate file that is only included by the filesystem sources
    during the build, and then the macros in <bits/c++config.h> are renamed
    so that they don't have any effect in user code including our headers.
    
    Also use larger type for result of filesystem::file_size to avoid
    truncation of large values on 32-bit systems (PR 91947).
    
    Backport from mainlne
    2019-10-04  Jonathan Wakely  <jwakely@redhat.com>
    
    	PR libstdc++/81091
    	PR libstdc++/91947
    	* configure.ac: Use AC_SYS_LARGEFILE to enable 64-bit file APIs.
    	* config.h.in: Regenerate:
    	* configure: Regenerate:
    	* include/Makefile.am (${host_builddir}/largefile-config.h): New
    	target to generate config header for filesystem library.
    	(${host_builddir}/c++config.h): Rename macros for large file support.
    	* include/Makefile.in: Regenerate.
    	* src/c++17/fs_dir.cc: Include new config header.
    	* src/c++17/fs_ops.cc: Likewise.
    	(filesystem::file_size): Use uintmax_t for size.
    	* src/filesystem/dir.cc: Include new config header.
    	* src/filesystem/ops.cc: Likewise.
    	(experimental::filesystem::file_size): Use uintmax_t for size.
Comment 9 Jonathan Wakely 2020-01-13 19:58:32 UTC
Fixed for GCC 9.3
Comment 10 Eric Botcazou 2020-01-23 07:23:22 UTC
Isn't the creation of these files in ${host_builddir} with >> racy though?
We recently had a build failure because largefile-config.h was truncated.

Here's an example from the GCC Makefile:

# all-tree.def includes all the tree.def files.
all-tree.def: s-alltree; @true
s-alltree: Makefile
	rm -f tmp-all-tree.def
	echo '#include "tree.def"' > tmp-all-tree.def
	echo 'END_OF_BASE_TREE_CODES' >> tmp-all-tree.def
	echo '#include "c-family/c-common.def"' >> tmp-all-tree.def
	ltf="$(lang_tree_files)"; for f in $$ltf; do \
	  echo "#include \"$$f\""; \
	done | sed 's|$(srcdir)/||' >> tmp-all-tree.def
	$(SHELL) $(srcdir)/../move-if-change tmp-all-tree.def all-tree.def
	$(STAMP) s-alltree

which looks more robust.
Comment 11 Jonathan Wakely 2020-01-23 09:32:37 UTC
It's done the same way as we've always created the ${host_builddir}/c++config.h target, so I'm not sure why one is racy and the other isn't.
Comment 12 Jonathan Wakely 2020-01-23 10:37:00 UTC
Rather than a race, I think what might happen is that an interrupted build leaves a partial file in place. If make is re-run it will think the file is present and won't try to create that target.

So writing to a temporary file and then moving it avoids that. I don't think the stamp file is strictly necessary.

I still don't see why the problem has appeared now, when we've been doing the same thing for years for the other generated header.

Reopening ...
Comment 13 Eric Botcazou 2020-01-23 11:44:22 UTC
> It's done the same way as we've always created the
> ${host_builddir}/c++config.h target, so I'm not sure why one is racy and the
> other isn't.

I didn't mean that the first one was not racy though ;-)  But it has got only 3 >> while the new one has got 13 of them, so this can make a difference in practice.

The issue happened on a Windows host for the 9 branch so you never know, but we never had such a thing before you had backported the change onto that branch.
Comment 14 Jonathan Wakely 2020-01-23 13:59:42 UTC
I think this is simpler and less likely to cause problems:

# This header is not installed, it's only used to build libstdc++ itself.
${host_builddir}/largefile-config.h: ${CONFIG_HEADER}
	@rm -f $@.tmp
	@-grep 'define _DARWIN_USE_64_BIT_INODE' ${CONFIG_HEADER} >> $@.tmp
	@-grep 'define _FILE_OFFSET_BITS' ${CONFIG_HEADER} >> $@.tmp
	@-grep 'define _LARGE_FILES' ${CONFIG_HEADER} >> $@.tmp
	@mv $@.tmp $@
Comment 15 Jonathan Wakely 2020-01-23 15:09:58 UTC
Fixed on trunk, backport to the branch needed too.
Comment 16 GCC Commits 2020-01-23 15:09:59 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:04681fca936b5bca1dd374dfb0fbca7ccb028994

commit r10-6177-g04681fca936b5bca1dd374dfb0fbca7ccb028994
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Jan 23 14:02:32 2020 +0000

    libstdc++: Simplify makefile rule for largefile-config.h (PR91947)
    
    The previous rule could leave an incomplete file if the build was
    interrupted, which would then not be remade if make was run again.
    
    This makes the rule more robust by writing to a temporary file and only
    moving it into place as the final step. It also simplifies the rule so
    that only the essential macro definitions are written to the file, not
    the explanatory comments and commented out #undef lines.
    
    Also, the macro for enabling LFS on Mac OS X 10.5 is now set
    unconditionally, which is a bug fix from upstream autoconf.
    
    	PR libstdc++/91947
    	* include/Makefile.am (${host_builddir}/largefile-config.h): Simplify
    	rule.
    	* include/Makefile.in: Regenerate.
Comment 17 GCC Commits 2020-01-24 12:20:23 UTC
The releases/gcc-9 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:e2bcf65feea9590e6f260862b3c740b5e803f851

commit r9-8174-ge2bcf65feea9590e6f260862b3c740b5e803f851
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Jan 24 11:13:55 2020 +0000

    libstdc++: Simplify makefile rule for largefile-config.h (PR91947)
    
    The previous rule could leave an incomplete file if the build was
    interrupted, which would then not be remade if make was run again.
    
    This makes the rule more robust by writing to a temporary file and only
    moving it into place as the final step. It also simplifies the rule so
    that only the essential macro definitions are written to the file, not
    the explanatory comments and commented out #undef lines.
    
    Also, the macro for enabling LFS on Mac OS X 10.5 is now set
    unconditionally, which is a bug fix from upstream autoconf.
    
    Backport from mainline
    2020-01-23  Jonathan Wakely  <jwakely@redhat.com>
    
    	PR libstdc++/91947
    	* include/Makefile.am (${host_builddir}/largefile-config.h): Simplify
    	rule.
    	* include/Makefile.in: Regenerate.
Comment 18 Jonathan Wakely 2020-01-24 12:20:34 UTC
Fixed for gcc-9 as well, please let me know if you see any more build issues here.
Comment 19 Eric Botcazou 2020-01-24 12:22:34 UTC
> Fixed for gcc-9 as well, please let me know if you see any more build issues
> here.

Sure, thanks for the quick resolution!
Comment 20 GCC Commits 2020-02-26 13:10:17 UTC
The releases/gcc-8 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:56b6c2ba3a45c768a33f726b130a56fc19cca650

commit r8-10084-g56b6c2ba3a45c768a33f726b130a56fc19cca650
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Jan 9 13:38:43 2020 +0000

    Build filesystem library with large file support
    
    Enable AC_SYS_LARGEFILE to set the macros needed for large file APIs to
    be used by default. We do not want to define those macros in the
    public headers that users include. The values of the macros are copied
    to a separate file that is only included by the filesystem sources
    during the build, and then the macros in <bits/c++config.h> are renamed
    so that they don't have any effect in user code including our headers.
    
    Also use larger type for result of filesystem::file_size to avoid
    truncation of large values on 32-bit systems (PR 91947).
    
    Backport from mainlne
    2019-10-04  Jonathan Wakely  <jwakely@redhat.com>
    
    	PR libstdc++/81091
    	PR libstdc++/91947
    	* configure.ac: Use AC_SYS_LARGEFILE to enable 64-bit file APIs.
    	* config.h.in: Regenerate:
    	* configure: Regenerate:
    	* include/Makefile.am (${host_builddir}/largefile-config.h): New
    	target to generate config header for filesystem library.
    	(${host_builddir}/c++config.h): Rename macros for large file support.
    	* include/Makefile.in: Regenerate.
    	* src/c++17/fs_dir.cc: Include new config header.
    	* src/c++17/fs_ops.cc: Likewise.
    	(filesystem::file_size): Use uintmax_t for size.
    	* src/filesystem/dir.cc: Include new config header.
    	* src/filesystem/ops.cc: Likewise.
    	(experimental::filesystem::file_size): Use uintmax_t for size.
Comment 21 GCC Commits 2020-02-26 13:10:23 UTC
The releases/gcc-8 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:761696abfe0a772315449e3d2b57de76756f5953

commit r8-10085-g761696abfe0a772315449e3d2b57de76756f5953
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Jan 24 11:13:55 2020 +0000

    libstdc++: Simplify makefile rule for largefile-config.h (PR91947)
    
    The previous rule could leave an incomplete file if the build was
    interrupted, which would then not be remade if make was run again.
    
    This makes the rule more robust by writing to a temporary file and only
    moving it into place as the final step. It also simplifies the rule so
    that only the essential macro definitions are written to the file, not
    the explanatory comments and commented out #undef lines.
    
    Also, the macro for enabling LFS on Mac OS X 10.5 is now set
    unconditionally, which is a bug fix from upstream autoconf.
    
    Backport from mainline
    2020-01-23  Jonathan Wakely  <jwakely@redhat.com>
    
    	PR libstdc++/91947
    	* include/Makefile.am (${host_builddir}/largefile-config.h): Simplify
    	rule.
    	* include/Makefile.in: Regenerate.