The following code prints uninitialized garbage when compiled with -O2 or -O3 in all versions of GCC 12 (including trunk). It works correctly in GCC 11. Replacing strncpy() with strncat() gives the same results, however when using the commented out strncpy() call, the output is as expected. See also how the generated assembly changes when the printf() inside the function is commented out. Expected output: Src: edcba Dst: edcba Actual output: Src: edcba Dst: <random uninitialized data> https://godbolt.org/z/5E51Gdh9s Compile args: g++ -O2 file.cpp 8<----------- file.cpp -------------- #include <cstring> #include <stdio.h> #include <algorithm> static constexpr unsigned N = 23; char dst[N + 1]; void invert(const char *id) { constexpr int MAX_LEN = 13; char buf[MAX_LEN]; char *ptr = buf + sizeof(buf); // start from the end of buf *(--ptr) = '\0'; // terminate string while (*id && ptr > buf) { *(--ptr) = *(id++); // copy id backwards } printf("Src: %s\n", ptr); strncpy(dst, ptr, N); // copy ptr/buf to dst //strncpy(dst, ptr, std::min<size_t>(N, strlen(ptr))); } int main() { invert("abcde"); printf("Dst: %s\n", dst); } 8<----------- file.cpp -------------- g++ -v -save-temps -O2 file.cpp Using built-in specs. COLLECT_GCC=/home/gb/.fighome/runtime/gcc/12.2.0-1/bin/g++ COLLECT_LTO_WRAPPER=/home/gb/.fighome/runtime/gcc/12.2.0-1/bin/../libexec/gcc/x86_64-linux-gnu/12.2.0/lto-wrapper Target: x86_64-linux-gnu Configured with: ../gcc-12.2.0/configure --prefix /data/work/chhq-sudbld10-001-CENTOS7/15df1254b8bb2e5c/scratch/gcc/12.2.0/staging --build=x86_64-linux-gnu --disable-multilib --disable-multiarch --enable-clocale=gnu --enable-languages=c,c++,fortran --enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto --enable-plugins --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-pkgversion=''\''internal_build'\''' --with-system-zlib --disable-werror --with-libelf=/data/work/chhq-sudbld10-001-CENTOS7/15df1254b8bb2e5c/scratch/gcc/12.2.0/build/libelf-0.8.13 Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.2.0 ('internal_build') COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-shared-libgcc' '-mtune=generic' '-march=x86-64' /home/gb/.fighome/runtime/gcc/12.2.0-1/bin/../libexec/gcc/x86_64-linux-gnu/12.2.0/cc1plus -E -quiet -v -iprefix /home/gb/.fighome/runtime/gcc/12.2.0-1/bin/../lib/gcc/x86_64-linux-gnu/12.2.0/ -D_GNU_SOURCE file.cpp -mtune=generic -march=x86-64 -O2 -fpch-preprocess
Note strcpy arguments cannot be overlapping if the dst overlaps with the src, then the behavior is undefined. I think you are hitting that undefined behavior here.
(In reply to Andrew Pinski from comment #1) > Note strcpy arguments cannot be overlapping if the dst overlaps with the > src, then the behavior is undefined. I think you are hitting that undefined > behavior here. I take that back, I read the source incorrectly at the time.
Started with r12-145-gd1d01a66012a93cc8cb7d
Oh the problem is related to tail calls: strncpy (&dst, ptr_30, 23); [tail call] That should not be marked as a tail call as buf is still alive during the call of strncpy. Simple workaround, add asm("":::"memory"); Right before the end of invert. Or use -fno-optimize-sibling-calls Self contained testcase: ``` #define N 23 #define MAX_LEN 13 char dst[N + 1]; void invert(const char *id) { char buf[MAX_LEN]; char *ptr = buf + sizeof(buf); // start from the end of buf *(--ptr) = '\0'; // terminate string while (*id && ptr > buf) { *(--ptr) = *(id++); // copy id backwards } __builtin_strncpy(dst, ptr, N); // copy ptr/buf to dst // asm("":::"memory"); // This "fixes" the issue. } int main() { invert("abcde"); if (strcmp(dst, "edcba")) __builtin_abort(); } ```
Yeah, exactly, the difference between the two revisions is first in tailc pass: --- pr109609.C.202t.tailc_ 2023-04-24 15:48:33.000000000 -0400 +++ pr109609.C.202t.tailc 2023-04-24 15:49:08.000000000 -0400 @@ -44,7 +44,7 @@ void invert (const char * id) <bb 4> [local count: 137085153]: # ptr_27 = PHI <ptr_10(3), &MEM <char[13]> [(void *)&buf + 12B](2)> __builtin_printf ("Src: %s\n", ptr_27); - __builtin_strncpy (&dst, ptr_27, 23); + __builtin_strncpy (&dst, ptr_27, 23); [tail call] buf ={v} {CLOBBER}; return; Obviously, we shouldn't tail call strncpy when the second argument points into an automatic variable in the current function.
Note, changing `ptr < buf` to `ptr != buf` still invokes the wrong code being generated.
Here's the code that still fails with -O3 -fno-optimize-sibling-calls: ``` #include <string.h> #include <stdint.h> #define N 23 #define MAX_LEN 13 char dst[N + 1]; void stringify(uint64_t id) { char buf[MAX_LEN]; char *ptr = buf + sizeof(buf); // start from the end of buf *(--ptr) = '\0'; // terminate string while (id && ptr > buf) { *(--ptr) = static_cast<char>(id % 10) + '0'; id /= 10; } __builtin_strncpy(dst, ptr, N); // copy ptr/buf to dst } int main() { stringify(12345); if (strcmp(dst, "12345")) __builtin_abort(); } ``` Notably this only fails with -O3, not with -O2
In that case it started with r12-382-ged3c43224cc4e378d But maybe it would be better to track it separately.
In any case it looks like I have to investigate.
On the first testcase reverting the offending rev. shows that it causes <bb 2> [local count: 137085152]: - MEM[(char *)&buf + 12B] = 0; - _19 = *id_8(D); - if (_19 != 0) + _18 = *id_7(D); + if (_18 != 0) thus we DSE the store to the end. The issue is that the fnspec we have for strncpy says the access size is specified by argument 3 but what it specified there is the _maximum_ size read, not the actual size. So instead of "1cO313" it should be "1cO31 " ('1' is somewhat odd then, it says we copy 'src' to 'dst' but we only say the 'dst' write covers arg 3 size - I guess that's OK for points-to analysis, the additional zeros written do not have pointers, but if we use it differently it might be a wrong spec?) I'm scanning other builtins for similar issues.
(In reply to Richard Biener from comment #10) > On the first testcase reverting the offending rev. shows that it causes > > <bb 2> [local count: 137085152]: > - MEM[(char *)&buf + 12B] = 0; > - _19 = *id_8(D); > - if (_19 != 0) > + _18 = *id_7(D); > + if (_18 != 0) > > thus we DSE the store to the end. > > The issue is that the fnspec we have for strncpy says the access size > is specified by argument 3 but what it specified there is the _maximum_ > size read, not the actual size. So instead of "1cO313" it should be > "1cO31 " ('1' is somewhat odd then, it says we copy 'src' to 'dst' > but we only say the 'dst' write covers arg 3 size - I guess that's OK > for points-to analysis, the additional zeros written do not have pointers, > but if we use it differently it might be a wrong spec?) > > I'm scanning other builtins for similar issues. Note a different fix would be to re-interpret the case of reads and say the size is _up to_ the specified length, thus use ao_ref_init_from_ptr_and_range instead of _and_size. strncat, stpncpy, strncmp, strnlen, strndup are affected similarly. for functions like memchr I'm not sure if we can assume all 'n' bytes are read (thus if that would cause known overread -> undefined behavior). Honza? Any opinion? Fix for the testcase, but incomplete as noted above: diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 0e06fa5b2e0..133707c1617 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -11562,11 +11562,12 @@ builtin_fnspec (tree callee) case BUILT_IN_STPCPY_CHK: return ".cO 1 "; case BUILT_IN_STRNCPY: + case BUILT_IN_STRNCPY_CHK: + return "1cO31 "; case BUILT_IN_MEMCPY: case BUILT_IN_MEMMOVE: case BUILT_IN_TM_MEMCPY: case BUILT_IN_TM_MEMMOVE: - case BUILT_IN_STRNCPY_CHK: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMMOVE_CHK: return "1cO313";
I had a look at memcmp recently in the context of PR109306 and my understanding is that the function may but doesn't have to access all bytes from both arrays up to the given size. While for memchr, C17 contains: "The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found." and my reading of that is that it has to stop reading upon reaching the match, so in that case the read is always just up to the given size, not the whole size. While e.g. strchr doesn't say something similar and so it needs to be passed valid zero terminated string and can but doesn't have to read the whole string.
strncpy second argument is an array rather than necessarily a string and characters after '\0' are not copied, so if n is non-zero, it reads between 1 and n characters from the source array (not sure if a dumb implementation could try to read all characters and only copy those until '\0' inclusive though).
Btw, looking at the user modref_access_analysis::get_access_for_fnspec it interprets the size as upper bound (also for 't'). Likewise for get_access_for_fnspec. Just the check_fnspec use in tree-ssa-alias.cc interprets both as exact sizes. So a "conservative" fix that's backportable would simply adjust the tree-ssa-alias.cc use to behave similarly.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:e8d00353017f895d03a9eabae3506fd126ce1a2d commit r14-225-ge8d00353017f895d03a9eabae3506fd126ce1a2d Author: Richard Biener <rguenther@suse.de> Date: Tue Apr 25 14:56:44 2023 +0200 tree-optimization/109609 - correctly interpret arg size in fnspec By majority vote and a hint from the API name which is arg_max_access_size_given_by_arg_p this interprets a memory access size specified as given as other argument such as for strncpy in the testcase which has "1cO313" as specifying the _maximum_ size read/written rather than the exact size. There are two uses interpreting it that way already and one differing. The following adjusts the differing and clarifies the documentation. PR tree-optimization/109609 * attr-fnspec.h (arg_max_access_size_given_by_arg_p): Clarify semantics. * tree-ssa-alias.cc (check_fnspec): Correctly interpret the size given by arg_max_access_size_given_by_arg_p as maximum, not exact, size. * gcc.dg/torture/pr109609.c: New testcase.
Fixed on trunk sofar.
Speaking of the size parameter, my workaround for the original issue was to pre-compute the size argument a different way. This however resulted in a warning that's both right and wrong. Here's the sample code that must be compiled with -O3 and -Wall: https://godbolt.org/z/jbf74994z ``` static constexpr unsigned N = 20; char dst[N + 1]; int main() { char buf[10]; char *src = buf + sizeof(buf); *(--src) = 0; *(--src) = 'a'; strncpy(dst, src, std::min<size_t>(N, strlen(src) + 1)); // constexpr volatile size_t n = std::min<size_t>(N, strlen(src) + 1); // strncpy(dst, src, n); } ``` <source>: In function 'int main()': <source>:13:10: warning: 'char* strncpy(char*, const char*, size_t)' specified bound depends on the length of the source argument [-Wstringop-truncation] 13 | strncpy(dst, src, std::min<size_t>(N, strlen(src) + 1)); | ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <source>:13:47: note: length computed here 13 | strncpy(dst, src, std::min<size_t>(N, strlen(src) + 1)); | ~~~~~~^~~~~ Sure, it depends on the src, but in a "good" way. The commented out code doesn't produce the warning. An alternative way to make the warning go away is to change to: char *src = buf + sizeof(buf) - 1;
The releases/gcc-13 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:df49e4602882eabe0642699fb71a70f6e120e263 commit r13-7252-gdf49e4602882eabe0642699fb71a70f6e120e263 Author: Richard Biener <rguenther@suse.de> Date: Tue Apr 25 14:56:44 2023 +0200 tree-optimization/109609 - correctly interpret arg size in fnspec By majority vote and a hint from the API name which is arg_max_access_size_given_by_arg_p this interprets a memory access size specified as given as other argument such as for strncpy in the testcase which has "1cO313" as specifying the _maximum_ size read/written rather than the exact size. There are two uses interpreting it that way already and one differing. The following adjusts the differing and clarifies the documentation. PR tree-optimization/109609 * attr-fnspec.h (arg_max_access_size_given_by_arg_p): Clarify semantics. * tree-ssa-alias.cc (check_fnspec): Correctly interpret the size given by arg_max_access_size_given_by_arg_p as maximum, not exact, size. * gcc.dg/torture/pr109609.c: New testcase. (cherry picked from commit e8d00353017f895d03a9eabae3506fd126ce1a2d)
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:2c7e89510fe41265b285e886d19f9895adf545e8 commit r12-9474-g2c7e89510fe41265b285e886d19f9895adf545e8 Author: Richard Biener <rguenther@suse.de> Date: Tue Apr 25 14:56:44 2023 +0200 tree-optimization/109609 - correctly interpret arg size in fnspec By majority vote and a hint from the API name which is arg_max_access_size_given_by_arg_p this interprets a memory access size specified as given as other argument such as for strncpy in the testcase which has "1cO313" as specifying the _maximum_ size read/written rather than the exact size. There are two uses interpreting it that way already and one differing. The following adjusts the differing and clarifies the documentation. PR tree-optimization/109609 * attr-fnspec.h (arg_max_access_size_given_by_arg_p): Clarify semantics. * tree-ssa-alias.cc (check_fnspec): Correctly interpret the size given by arg_max_access_size_given_by_arg_p as maximum, not exact, size. * gcc.dg/torture/pr109609.c: New testcase. (cherry picked from commit e8d00353017f895d03a9eabae3506fd126ce1a2d)
Fixed.