Even the hacks work the same result. https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/benchmarks/0000.10m_size_t/unit/filebuf_io_observer.cc D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>g++ -o filebuf_io_observer filebuf_io_observer.cc -Ofast -std=c++2a -s D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer output: 0.5130060000000001s input: 0.256011s running the same program on Linux WSL2 cqwrteur@Home-Server:~/myhome/fast_io/benchmarks/0000.10m_size_t/unit$ g++ -o filebuf_io_observer filebuf_io_observer.cc -Ofast -std=c++2a -s cqwrteur@Home-Server:~/myhome/fast_io/benchmarks/0000.10m_size_t/unit$ ./filebuf_io_observer output: 0.058395978s input: 0.06603426700000001s It is not possible to be an optimization problem since my code has no difference on windows and linux. I think the only reason is FILE*'s issue. I guess the problem is just that code relies on unknown libc does not correctly on windows. One explanation might be the underlining FILE* buffer size's problem. I think you guys need to try larger buffer size. Same program builds with MSVC STL (Their ABIs are terrible) MSVC STL with hacking: D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer output: 0.119666s input: 0.2497127s Other comparison benchmarks GCC 10.0.1 with msvcrt hacking. https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/include/fast_io_legacy_impl/c/msvcrt.h D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>c_file_unlocked output: 0.09216600000000001s input: 0.12230700000000001s GCC
https://bitbucket.org/ejsvifq_mabmip/mingw-gcc/src/master/ You can try with this MinGW-GCC 10. I have found this issue for a very long time. It looks like std::filebuf does not set buffer size correctly or other weird issues with underlining msvcrt.
The hacking code is here. https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/include/fast_io_legacy_impl/cpp/libstdc%2B%2B_libc%2B%2B.h https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/include/fast_io_legacy_impl/cpp/general.h https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/include/fast_io_legacy_impl/cpp/streambuf_io_observer.h
I have found out the reason. It is because the buffer size is too small on the windows. BUFSIZ == 512 You need to set the value to 4096 at least. I think even 65536 is something should be done since the windows syscall is extremely expensive. I have tested with WriteFile, the maximum performance needs to set to 131072 buffer_size with my library. I think I can fix this bug easily.
(In reply to fdlbxtqi from comment #3) > I have found out the reason. It is because the buffer size is too small on > the windows. BUFSIZ == 512 > > You need to set the value to 4096 at least. I think even 65536 is something > should be done since the windows syscall is extremely expensive. I have > tested with WriteFile, the maximum performance needs to set to 131072 > buffer_size with my library. > > I think I can fix this bug easily. The most efficient one is 1048576. However, I think it should be set to at least 65536, although 4096 can already fix most of the problems. Windows syscall is expensive.
Here is the proof. https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/benchmarks/0000.10m_size_t/unit/streambuf_io_observer___gnu_cxx_stdio_filebuf.cc
Created attachment 48086 [details] simple patch
I rebuilt a new GCC with this patch. The performance improves for 10 times for output integer. 3 times for input integer. Definitely worth it. D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>gcc --version gcc (master HEAD with MCF thread model, built by cqwrteur.) 10.0.1 20200323 (experimental) Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>g++ -o filebuf_io_observer filebuf_io_observer.cc -Ofast -std=c++2a -s D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer output: 0.047s input: 0.085998s D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>g++ -o filebuf_io_observer filebuf_io_observer.cc -Ofast -std=c++2a -s -flto D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer output: 0.042999s input: 0.086012s Ther performance is now nearly as good as my buffer implementation. Though it is still slower due to hacking. It is actually usable. g++ -o iobuf_file iobuf_file.cc -Ofast -std=c++2a -s -flto ./iobuf_file.exe output: 0.044000000000000004s input: 0.076s
Well. These links are dead. Repost them. https://bitbucket.org/ejsvifq_mabmip/fast_io/src/master/benchmarks/0000.10m_size_t/unit/filebuf_io_observer.cc https://bitbucket.org/ejsvifq_mabmip/fast_io/src/master/benchmarks/0000.10m_size_t/unit/streambuf_io_observer___gnu_cxx_stdio_filebuf.cc https://bitbucket.org/ejsvifq_mabmip/fast_io/src/master/benchmarks/0000.10m_size_t/unit/c_file_unlocked.cc https://bitbucket.org/ejsvifq_mabmip/fast_io/src/master/include/fast_io_legacy_impl/cpp/libstdc%2B%2B_libc%2B%2B.h https://bitbucket.org/ejsvifq_mabmip/fast_io/src/master/include/fast_io_legacy_impl/cpp/general.h https://bitbucket.org/ejsvifq_mabmip/fast_io/src/master/include/fast_io_legacy_impl/cpp/streambuf_io_observer.h https://bitbucket.org/ejsvifq_mabmip/fast_io/src/master/include/fast_io_legacy_impl/c/msvcrt.h
https://github.com/microsoft/WSL/issues/3898
What about adding another check when BUFSIZ is smaller than 4KB? If it is smaller than 4kb, adjust the filebuf size to 4kb at least.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:0bc199fc5d4eef5a20ced20df892e5e3b8821b60 commit r11-4479-g0bc199fc5d4eef5a20ced20df892e5e3b8821b60 Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Oct 28 13:19:21 2020 +0000 libstdc++: Override BUFSIZ for Windows targets [PR 94268] This replaces uses of BUFSIZ with a new _GLIBCXX_BUFSIZ macro that can be overridden in target-specific config headers. That allows the mingw and mingw-w64 targets to override it, because BUFSIZ is apparently defined to 512, resulting in poor performance. The MSVCRT stdio apparently uses 4096, so we use that too. libstdc++-v3/ChangeLog: PR libstdc++/94268 * config/os/mingw32-w64/os_defines.h (_GLIBCXX_BUFSIZ): Define. * config/os/mingw32/os_defines.h (_GLIBCXX_BUFSIZ): Define. * include/bits/fstream.tcc: Use _GLIBCXX_BUFSIZ instead of BUFSIZ. * include/ext/stdio_filebuf.h: Likewise. * include/std/fstream (_GLIBCXX_BUFSIZ): Define.
Changed to 4096 for mingw and mingw-w64 in GCC 11.