$ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.2.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC) Compiling the file uses this command line: gcc -DPACKAGE_NAME=\"clevis\" -DPACKAGE_TARNAME=\"clevis\" -DPACKAGE_VERSION=\"1\" -DPACKAGE_STRING=\"clevis\ 1\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"clevis\" -DVERSION=\"1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -I. -Wall -Wextra -Werror -Wstrict-aliasing -Wchar-subscripts -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wpointer-arith -Wshadow -Wsign-compare -Wstrict-prototypes -Wtype-limits -Wno-missing-field-initializers -Wno-unused-parameter -pthread -I/usr/include/udisks2 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -DCLEVIS_CMD_DIR=/usr/libexec/clevis -g -O2 -MT clevis-luks-udisks2.o -MD -MP -MF $depbase.Tpo -c -o clevis-luks-udisks2.o clevis-luks-udisks2.c The relevant code is here[1]. GCC sees that "pkt_t key" might go out of scope and therefore optimizes out the zeroing of the struct that occurs at the end of the for loop iteration. However, key doesn't go out of scope during loop iteration. Thus, on subsequent loops the struct is used presuming that it contains a zero value but it doesn't. This means that (what I take to be) perfectly valid ISO C behaves in strange ways. In particular, the bug that arises is as follows. When load_jwe() fails[2], the (incorrectly) presumed zeroed key struct is sent[3] to the client. The end result is that an invalid key (the previous successful one) is sent when an empty UDP packet should be sent. GCC should detect that "pkt_t key" is not going out of scope and should zero it as requested rather than invalidly optimizing out this statement. Clang does not have this bug, so there is at least one example of a compiler doing the right thing. [1] - https://github.com/latchset/clevis/blob/6834b02b390771a97775dcfa26f85243f3a0195d/udisks2/clevis-luks-udisks2.c#L446 [2] - https://github.com/latchset/clevis/blob/6834b02b390771a97775dcfa26f85243f3a0195d/udisks2/clevis-luks-udisks2.c#L455 [3] - https://github.com/latchset/clevis/blob/6834b02b390771a97775dcfa26f85243f3a0195d/udisks2/clevis-luks-udisks2.c#L469
Do you think you could produce a smaller, stand-alone testcase that reproduces the issue? Or at least attach the preprocessed sources to the report?
Created attachment 40076 [details] simple test case Compile with: gcc -o test test.c This is a simple echo server. It *should* echo whatever the client types. However, if you type "foo" the first time and then type ctrl-d the second time, the second reply is the same as the first reply. This is because the buffer was not properly zeroed.
Created attachment 40077 [details] output assembly from the test case This assembly was produced with: gcc -S test.c.
Confirmed. Trunk is fine. --- good 2016-11-17 21:45:22.671247332 +0100 +++ bad 2016-11-17 21:45:18.851324283 +0100 @@ -58,11 +58,6 @@ call write testq %rax, %rax jle .L10 - leaq -65536(%rbp), %rax - movl $65520, %edx - movl $0, %esi - movq %rax, %rdi - call memset leaq -131056(%rbp), %rax leaq -65536(%rbp), %rcx movl $65520, %edx @@ -86,5 +81,5 @@ .cfi_endproc .LFE0: .size main, .-main - .ident "GCC: (GNU) 7.0.0 20161117 (experimental)" + .ident "GCC: (GNU) 6.2.1 20161017" .section .note.GNU-stack,"",@progbits
This has nothing to do with "aggressive optimization" it is a bug in C front-end, I guess in the "loop initial declarations" implementation.
I don't think this is related to the loop, a = b = (struct buf) {} loses the assignment to b even if you put it alone on its line.
I can't reproduce this with recent GCC 6 branch. Possibly a duplicate of bug 72747?
(In reply to joseph@codesourcery.com from comment #7) > I can't reproduce this with recent GCC 6 branch. Possibly a duplicate of > bug 72747? Yes. *** This bug has been marked as a duplicate of bug 72747 ***
Note in the testcase a = (struct buf) {}, b = (struct buf) {} generates significantly more efficient code than a = b = (struct buf) {} - the former is 2x memset, the latter 1x memset + 1x memcpy. So, shall we for large aggregates gimplify those differently as an optimization?
(In reply to Jakub Jelinek from comment #9) > Note in the testcase a = (struct buf) {}, b = (struct buf) {} generates > significantly more efficient code than a = b = (struct buf) {} - the former > is 2x memset, the latter 1x memset + 1x memcpy. > So, shall we for large aggregates gimplify those differently as an > optimization? I believe the gimplifier is not a good place to do optimization. You can pattern-match memset (&a, ..., N); memcpy (&b, &a, N); and transform it to two times memset.
(In reply to Richard Biener from comment #10) > (In reply to Jakub Jelinek from comment #9) > > Note in the testcase a = (struct buf) {}, b = (struct buf) {} generates > > significantly more efficient code than a = b = (struct buf) {} - the former > > is 2x memset, the latter 1x memset + 1x memcpy. > > So, shall we for large aggregates gimplify those differently as an > > optimization? > > I believe the gimplifier is not a good place to do optimization. You can > pattern-match > > memset (&a, ..., N); > memcpy (&b, &a, N); > > and transform it to two times memset. Even the *.optimized dump contains: b = {}; a = b; d = {}; c = d; Can match.pd handle these, or would it need to be special folding code (where? gimple-fold.c, tree dse, what other passes have good framework for that)?
Author: jakub Date: Fri Dec 16 15:57:43 2016 New Revision: 243753 URL: https://gcc.gnu.org/viewcvs?rev=243753&root=gcc&view=rev Log: PR c/78408 * tree-ssa-ccp.c: Include tree-dfa.h. (optimize_memcpy): New function. (pass_fold_builtins::execute): Use it. Remove useless conditional break after BUILT_IN_VA_*. * gcc.dg/pr78408-1.c: New test. * gcc.dg/pr78408-2.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr78408-1.c trunk/gcc/testsuite/gcc.dg/pr78408-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/sanopt.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-ccp.c