Given the following C code: % cat a.c #include <string.h> static const float z[1] = {0}; int f(float x) { return memcmp(&x, z, sizeof(x)); } GCC 10 generates this on x86-64: % gcc -Wall -O2 -c a.c && objdump -d -Mintel a.o a.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <f>: 0: f3 0f 11 44 24 fc movss DWORD PTR [rsp-0x4],xmm0 6: 0f b6 44 24 fc movzx eax,BYTE PTR [rsp-0x4] b: c3 ret This doesn't happen if "= {0}" is removed from the z initialization (wtf?). It also doesn't happen with -O1.
Confirmed. It looks like string handling creeps in here somehow during RTL expansion.
Regressed in r10-2769-g14b7950f126f84fa585e3a057940ff10d4c5b3f8
For memcmp, it looks completely wrong to use c_getstr, I think in that case it should use string_constant instead, or if it uses it, it should punt if the returned length by it is smaller than the length of the memcmp call and not use strnlen.
Isn't the problem that c_getstr(arg2, &len2) sets len2 to 1 instead of 4 (i.e, sizeof (x)) as the function comment suggests should happen? /* Return a pointer P to a NUL-terminated string representing the sequence of constant characters referred to by SRC (or a subsequence of such characters within it if SRC is a reference to a string plus some constant offset). If STRLEN is non-null, store the number of bytes in the string constant including the terminating NUL char. *STRLEN is typically strlen(P) + 1 in the absence of embedded NUL characters. */ const char * c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */) If that's what the function is supposed to do (and even if it isn't), using the name STRLEN for anything that's not actually the result of strlen() is terribly misleading, practically begging to be misused. In fact, going back in time, it looks like starting with r262522 the function did take two pointers: STRLEN for the string length (i.e., the result of strlen) and STRSIZE for the size, until r264301 when it was changed to the current form.
I'd like to point out that this regression impacts badly a production app. We're using this pattern to compare an input vector of floats to a vector of zeros, but the comparison always returns 0 now, which basically breaks everything. We do have a workaround (removing the "= {0}"), but I don't think this is an isolate case. Maybe the priority could be raised?
The priority is up to the GCC release managers but raising it won't make the fix available before 10.2 (except on trunk). Regardless, I should have a patch soon.
The following is a more straightforward test case that's also miscompiled to return zero: int main () { char a[] = "\0abc"; return __builtin_memcmp (a, "\0\0\0\0", 4); } main: .LFB0: .cfi_startproc xorl %eax, %eax ret There seems to a series of mistakes that led to this bug. First, in the original test case, the initializer for the const float array of all zeros is just a single null byte, regardless of how big the array is. c_getstr() returns the empty string for such an array and sets *STRLEN to 1. As I mentioned in comment #4, this is very confusing given that the function can be called for non-strings and that the comment suggests embedded nuls are reflected in the STRLEN result. The c_getstr() caller that ultimately interprets this result wrong is inline_expand_builtin_string_cmp() in builtins.c. The comment above the function reads: /* Inline expansion a call to str(n)cmp, with result going to TARGET if that's convenient. If the call is not been inlined, return NULL_RTX. */ but it's (also) called from expand_builtin_memcmp(), so yet again, the comment is incorrect or at best misleading. The function then implements the correct behavior for strncmp() but not for memcmp(). The miscompilation was apparently introduced by r272993, most likely because of the misleading function name. The call to inline_expand_builtin_string_cmp() from expand_builtin_memcmp() was introduced in r262636.
Patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549225.html
*** Bug 96007 has been marked as a duplicate of this bug. ***
Please note that there seems to be two issues here that affect different versions of gcc: (From the comment of gcc@pkh.me) https://godbolt.org/z/xc59TM This fails in 9.3.0 but works in 10.1.0. (From the comment of Martin Sebor) https://godbolt.org/z/9Kh1Tn This fails in both 9.3.0 and 10.1.0. Maybe the difference is caused by some other feature/bugfix that is present in 10.1.0 but not in 9.3.0. Anyway, I just want to highlight this and hope it can help with getting a proper fix for this bug. Thanks.
The master branch has been updated by Martin Sebor <msebor@gcc.gnu.org>: https://gcc.gnu.org/g:d5803b9876b3d11c93d1a10fabb3fbb1c4a14bd6 commit r11-2231-gd5803b9876b3d11c93d1a10fabb3fbb1c4a14bd6 Author: Martin Sebor <msebor@redhat.com> Date: Mon Jul 20 12:06:18 2020 -0600 Correct handling of constant representations containing embedded nuls. Resolves: PR middle-end/95189 - memcmp being wrongly stripped like strcm PR middle-end/95886 - suboptimal memcpy with embedded zero bytes gcc/ChangeLog: PR middle-end/95189 PR middle-end/95886 * builtins.c (inline_expand_builtin_string_cmp): Rename... (inline_expand_builtin_bytecmp): ...to this. (builtin_memcpy_read_str): Don't expect data to be nul-terminated. (expand_builtin_memory_copy_args): Handle object representations with embedded nul bytes. (expand_builtin_memcmp): Same. (expand_builtin_strcmp): Adjust call to naming change. (expand_builtin_strncmp): Same. * expr.c (string_constant): Create empty strings with nonzero size. * fold-const.c (c_getstr): Rename locals and update comments. * tree.c (build_string): Accept null pointer argument. (build_string_literal): Same. * tree.h (build_string): Provide a default. (build_string_literal): Same. gcc/testsuite/ChangeLog: PR middle-end/95189 PR middle-end/95886 * gcc.dg/memcmp-pr95189.c: New test. * gcc.dg/strncmp-3.c: New test. * gcc.target/i386/memcpy-pr95886.c: New test.
Fixed in r11-2231 on trunk.
GCC 10.2 is released, adjusting target milestone.
*** Bug 96631 has been marked as a duplicate of this bug. ***
Is the patch eligible for backporting? Users are hitting this as shown by dups and questions elsewhere like https://stackoverflow.com/questions/63724679/wrong-gcc-9-and-higher-optimization-of-memcmp-with-fno-inline
I think it should be. Let me do it.
Created attachment 49276 [details] Hacky patch that makes GCC print a diagnostic message if hitting the bug We hit this bug in a test case in secp256k1, the cryptography library used in Bitcoin Core. The bug appeared pretty scary to us at first glance because memcmp is widely used. After looking at the GCC patch that fixes this, we believe that this bug only occurs when * at least one of the compared byte arrays is constant and has a zero byte in position 0, 1, 2, or 3, *and* * the result of the memcmp isn't immediately used in a "== 0" or "!= 0" test (or equivalently "if(memcmp(...))" or "if(!memcmp(...))"). In particular the second condition makes this bug pretty rare and explains why it's mostly hit in non-inlined memcmp wrappers. (But in our case we hit it with a "<" comparison. ) For anyone else who's concerned about this bug, we've created a hacky patch (co-authored by Russell O'Connor) for the GCC versions with this bug that makes GCC print a diagnostic message if it emits wrong code due to this bug. We think the message is sound but it would be great if someone could confirm this.
Someone pointed out to me that the bug metadata says "Known to work: 9.3.0"
That was set early on and not adjusted when new testcases were found. Changed now. The original testcase in comment #0 happened to work with 9.3, but breaks if 'float z[1]' is changed to 'char z[4]'. Testcases from the dup and from comment #7 also fail with 9.3.
For what it's worth, -fno-builtin is a workaround for this entire class of bug.
Not entirely, I think? At least according to the docs: if you wish to enable built-in functions selectively when using -fno-builtin or -ffreestanding, you may define macros such as: #define abs(n) __builtin_abs ((n)) #define strcpy(d, s) __builtin_strcpy ((d), (s)) Which suggests code that calls __builtin_memcmp explicitly could still be broken.
The releases/gcc-10 branch has been updated by Martin Sebor <msebor@gcc.gnu.org>: https://gcc.gnu.org/g:e4c9aac98611f63847ef6c57916808d9a2d7abcb commit r10-8871-ge4c9aac98611f63847ef6c57916808d9a2d7abcb Author: Martin Sebor <msebor@redhat.com> Date: Thu Oct 8 12:35:01 2020 -0600 Correct handling of constant representations containing embedded nuls (backport from trunk). Resolves: PR middle-end/95189 - memcmp being wrongly stripped like strcm PR middle-end/95886 - suboptimal memcpy with embedded zero bytes gcc/ChangeLog: PR middle-end/95189 PR middle-end/95886 * builtins.c (inline_expand_builtin_string_cmp): Rename... (inline_expand_builtin_bytecmp): ...to this. (builtin_memcpy_read_str): Don't expect data to be nul-terminated. (expand_builtin_memory_copy_args): Handle object representations with embedded nul bytes. (expand_builtin_memcmp): Same. (expand_builtin_strcmp): Adjust call to naming change. (expand_builtin_strncmp): Same. * expr.c (string_constant): Create empty strings with nonzero size. * fold-const.c (c_getstr): Rename locals and update comments. * tree.c (build_string): Accept null pointer argument. (build_string_literal): Same. * tree.h (build_string): Provide a default. (build_string_literal): Same. gcc/testsuite/ChangeLog: PR middle-end/95189 PR middle-end/95886 * gcc.dg/memcmp-pr95189.c: New test. * gcc.dg/strncmp-3.c: New test. * gcc.target/i386/memcpy-pr95886.c: New test.
Backported to 10-branch via r10-8871.
The fixes do not seem trivial to backport; lots of conflicts. It would be really helpful to have versions of the patch that are minified and applicable to all affected versions that might be shipping in distros (looks like 9.2, 9.3, 10.1, and 10.2) since this is a critical codegen regression.
Created attachment 49334 [details] Hacky patch to workaround the bug by disabling builtin memcmp always I wrote and am rebuilding my own systems with a very minimal patch that essentially disables the builtin memcmp entirely. It's probably overkill and bad for performance, but it should at least produce sane code and apply to pretty much any GCC (warning: including unaffected versions).
Is that complete, or is it unclear whether there are code paths other than builtin memcmp by which this is hit? Am I correct in assuming that with builtin memcmp expansion returning NULL_RTX, GCC always expands it to a function call?
AFAIK it's complete, but I am not a GCC developer.
Created attachment 49866 [details] My backport of fix to 9.3.0 For the benefit of anyone that needs it, I've attached my own attempt to backport the fix directly to the 9.3.0 release (maybe it will apply against the current 9.4 branch, not sure, haven't tried). It seems to fix the issue but I'm not a GCC developer.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
This is a critical codegen issue. Is it really still not fixed in 9.4.0?
The original reported case doesn't affect GCC 9 it seems but the following does: int main() { char a[] = "\0abc"; volatile int res = __builtin_memcmp (a, "\0\0\0\0", 4); if (res == 0) __builtin_abort (); return 0; }
I'm not planning to backport the patch.
Fixed in GCC 10.3