Bug 94268 - std::filebuf is extremely (at least 10x) slow on windows compared to Linux. Even much slower MSVC STL with terrible ABI.
Summary: std::filebuf is extremely (at least 10x) slow on windows compared to Linux. E...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: 11.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2020-03-23 09:08 UTC by fdlbxtqi
Modified: 2020-10-28 20:29 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-w64-mingw
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
simple patch (1.19 KB, patch)
2020-03-23 11:20 UTC, fdlbxtqi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description fdlbxtqi 2020-03-23 09:08:22 UTC
Even the hacks work the same result.

https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/benchmarks/0000.10m_size_t/unit/filebuf_io_observer.cc

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>g++ -o filebuf_io_observer filebuf_io_observer.cc -Ofast -std=c++2a -s

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer
output: 0.5130060000000001s
input:  0.256011s

running the same program on Linux WSL2

cqwrteur@Home-Server:~/myhome/fast_io/benchmarks/0000.10m_size_t/unit$ g++ -o filebuf_io_observer filebuf_io_observer.cc
 -Ofast -std=c++2a -s
cqwrteur@Home-Server:~/myhome/fast_io/benchmarks/0000.10m_size_t/unit$ ./filebuf_io_observer
output: 0.058395978s
input:  0.06603426700000001s

It is not possible to be an optimization problem since my code has no difference on windows and linux. I think the only reason is FILE*'s issue.
I guess the problem is just that code relies on unknown libc does not correctly on windows. One explanation might be the underlining FILE* buffer size's problem. I think you guys need to try larger buffer size.

Same program builds with MSVC STL (Their ABIs are terrible)
MSVC STL with hacking:
D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer
output: 0.119666s
input:  0.2497127s

Other comparison benchmarks

GCC 10.0.1 with msvcrt hacking.
https://bitbucket.org/ejsvifq_mabmip/fast_io/src/reserver_test/include/fast_io_legacy_impl/c/msvcrt.h

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>c_file_unlocked
output: 0.09216600000000001s
input:  0.12230700000000001s

GCC
Comment 1 fdlbxtqi 2020-03-23 09:15:00 UTC
https://bitbucket.org/ejsvifq_mabmip/mingw-gcc/src/master/
You can try with this MinGW-GCC 10. I have found this issue for a very long time. It looks like std::filebuf does not set buffer size correctly or other weird issues with underlining msvcrt.
Comment 3 fdlbxtqi 2020-03-23 09:36:44 UTC
I have found out the reason. It is because the buffer size is too small on the windows. BUFSIZ == 512

You need to set the value to 4096 at least. I think even 65536 is something should be done since the windows syscall is extremely expensive. I have tested with WriteFile, the maximum performance needs to set to 131072 buffer_size with my library.

I think I can fix this bug easily.
Comment 4 fdlbxtqi 2020-03-23 09:38:27 UTC
(In reply to fdlbxtqi from comment #3)
> I have found out the reason. It is because the buffer size is too small on
> the windows. BUFSIZ == 512
> 
> You need to set the value to 4096 at least. I think even 65536 is something
> should be done since the windows syscall is extremely expensive. I have
> tested with WriteFile, the maximum performance needs to set to 131072
> buffer_size with my library.
> 
> I think I can fix this bug easily.

The most efficient one is 1048576. However, I think it should be set to at least 65536, although 4096 can already fix most of the problems. Windows syscall is expensive.
Comment 6 fdlbxtqi 2020-03-23 11:20:33 UTC
Created attachment 48086 [details]
simple patch
Comment 7 fdlbxtqi 2020-03-23 12:29:07 UTC
I rebuilt a new GCC with this patch. The performance improves for 10 times for output integer. 3 times for input integer. Definitely worth it.

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>gcc --version
gcc (master HEAD with MCF thread model, built by cqwrteur.) 10.0.1 20200323 (experimental)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>g++ -o filebuf_io_observer filebuf_io_observer.cc -Ofast -std=c++2a -s

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer
output: 0.047s
input:  0.085998s

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>g++ -o filebuf_io_observer filebuf_io_observer.cc -Ofast -std=c++2a -s -flto

D:\hg\w4\f8\fast_io\benchmarks\0000.10m_size_t\unit>filebuf_io_observer
output: 0.042999s
input:  0.086012s


Ther performance is now nearly as good as my buffer implementation. Though it is still slower due to hacking. It is actually usable.
g++ -o iobuf_file iobuf_file.cc -Ofast -std=c++2a -s -flto


./iobuf_file.exe
output: 0.044000000000000004s
input:  0.076s
Comment 9 fdlbxtqi 2020-03-30 09:17:22 UTC
https://github.com/microsoft/WSL/issues/3898
Comment 10 fdlbxtqi 2020-05-25 00:23:10 UTC
What about adding another check when BUFSIZ is smaller than 4KB? If it is smaller than 4kb, adjust the filebuf size to 4kb at least.
Comment 11 GCC Commits 2020-10-28 13:19:27 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:0bc199fc5d4eef5a20ced20df892e5e3b8821b60

commit r11-4479-g0bc199fc5d4eef5a20ced20df892e5e3b8821b60
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Oct 28 13:19:21 2020 +0000

    libstdc++: Override BUFSIZ for Windows targets [PR 94268]
    
    This replaces uses of BUFSIZ with a new _GLIBCXX_BUFSIZ macro that can
    be overridden in target-specific config headers.
    
    That allows the mingw and mingw-w64 targets to override it, because
    BUFSIZ is apparently defined to 512, resulting in poor performance. The
    MSVCRT stdio apparently uses 4096, so we use that too.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/94268
            * config/os/mingw32-w64/os_defines.h (_GLIBCXX_BUFSIZ):
            Define.
            * config/os/mingw32/os_defines.h (_GLIBCXX_BUFSIZ):
            Define.
            * include/bits/fstream.tcc: Use _GLIBCXX_BUFSIZ instead
            of BUFSIZ.
            * include/ext/stdio_filebuf.h: Likewise.
            * include/std/fstream (_GLIBCXX_BUFSIZ): Define.
Comment 12 Jonathan Wakely 2020-10-28 13:25:54 UTC
Changed to 4096 for mingw and mingw-w64 in GCC 11.