Created attachment 48686 [details] Example 2 GCC's Object Size Checking doc says: "There are built-in functions added for many common string operation functions, e.g., for memcpy __builtin___memcpy_chk built-in is provided. This built-in has an additional last argument, which is the number of bytes remaining in the object the dest argument points to or (size_t) -1 if the size is not known. The built-in functions are optimized into the normal string functions like memcpy if the last argument is (size_t) -1 or if it is known at compile time that the destination object will not be overflowed..." https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html In the attached example1.c, __builtin___memcpy_chk() is optimized into the normal memcpy(), as expected. But in a slightly different example2.c, it is not, despite an object size of -1. When the checked version is left in place (like example2.c), it causes a significant regression in my case. This is important because Ubuntu 18.04 uses _FORTIFY_SOURCE, which ends up using __builtin___memcpy_chk() for memcpy. If gcc is arbitrarily leaving it in place when it should be (according to the docs) optimized away, that affects a lot of code. I'm seeing this on Ubuntu 18.04 with both: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 gcc-9 (Ubuntu 9.2.1-19ubuntu1~18.04.york0) 9.2.1 20191109 It happens with or without -fno-builtin-memcpy (which is not a surprise, since I am directly calling the builtin version anyway). Compiled using: gcc-9 -O2 -c -S -o example1.S example1.c gcc-9 -O2 -c -S -o example2.S example2.c example1.S:50: call memcpy@PLT example2.S:75: rep movsq
Created attachment 48687 [details] Example 1
Created attachment 48688 [details] Example 3 Another example that works (i.e. builtin is properly replaced by memcpy as described in the document). The only difference between this working example and the failing example2.c is that I replaced the sizeof() with a constant.
Original larger case was discovered in PostgreSQL: https://www.postgresql.org/message-id/99b2eab335c1592c925d8143979c8e9e81e1575f.camel@j-davis.com
It is unclear what you are complaining about. for i in gcc-7 gcc-8 gcc-9 gcc-10 gcc; do echo $i; for j in 1 2 3; do /usr/src/$i/obj/gcc/cc1 -quiet -O2 pr95556-$j.c; done; grep 'memcpy\|rep.movs' pr95556-*.s; done gcc-7 pr95556-1.s: rep movsq pr95556-2.s: call memcpy pr95556-3.s: call memcpy gcc-8 pr95556-1.s: rep movsq pr95556-2.s: call memcpy pr95556-3.s: call memcpy gcc-9 pr95556-1.s: rep movsq pr95556-2.s: call memcpy pr95556-3.s: call memcpy gcc-10 pr95556-1.s: rep movsq pr95556-2.s: rep movsq pr95556-3.s: call memcpy gcc pr95556-1.s: rep movsq pr95556-2.s: rep movsq pr95556-3.s: call memcpy There are no __memcpy_chk calls, which means GCC did in all cases what is documented, replace the __builtin___memcpy_chk calls with the corresponding __builtin_memcpy calls and handled that as usually (which isn't always a library call, there are many different options how a builtin memcpy can be expanded and one can find tune that through various command line options. It depends on what CPU the code is tuned for, whether it is considered hot or cold code, whether the size is constant and what constant or if it is variable and what alignment guarantees the destination and source has.
And note that - if (lt->pos >= (8192-sizeof(S))) + if (lt->pos >= (8192-16)) is not an insignificant change, the first one is unsigned comparison, the second one signed.
See -mno-align-stringops, -minline-all-stringops, -minline-stringops-dynamically, -mstringop-strategy= , -mmemcpy-strategy= options and their documentation in the GCC manual.
"...built-in functions are optimized into the normal string functions like memcpy if the last argument is (size_t) -1..." My reading of the document lead me to believe that a last argument of -1 *would* be a normal library call. And certainly should be with -fno-builtin-memcpy, right? If that's not what's happening, should the document be clarified?
(In reply to Jeff Davis from comment #7) > "...built-in functions are optimized into the normal string functions like > memcpy if the last argument is (size_t) -1..." > > My reading of the document lead me to believe that a last argument of -1 > *would* be a normal library call. And certainly should be with > -fno-builtin-memcpy, right? No. Because -fno-builtin-memcpy only disables the special behavior if one uses memcpy, when one uses __builtin_memcpy, it behaves always as builtin. And you are using __builtin___memcpy_chk which is also a builtin and thus not affected by -fno-builtin*. You can use -fno-builtin-__memcpy_chk but then you'll get __memcpy_chk calls if you call it that way. As I wrote, if you for whatever reason want to use the library call, e.g. always, you can just use -mmemcpy-strategy=libcall:-1:1 or so, but then even very small ones will not be done inline, which is not really beneficial.
I still feel like the documentation is misleading on this point. Regardless, it doesn't seem like you think there is any bug here, so go ahead and close.
.