Created attachment 30819 [details] .ii file generated from nantest1.cpp The program below produces incorrect code on x86-32 with gcc 4.6.3. (It was also found to fail in 4.7 but succeed in 4.8.) The problem happens when: 1. The union is initialized via its 64-bit double field 2. The union is then updated via its 64-bit integer field 3. The union is copied to another variable When this happens, the 64-bit copy in step 3 is done via the x87 fldl/fstpl instructions. If the 64-bit integer value in step 2 happens to be an sNaN pattern, the fldl instruction transforms it to a qNaN pattern and the fstpl instruction writes the wrong value. If step 1 is omitted, the x87 instructions are not used and the code is correct. This suggests that the optimization of using x87 for 64-bit transfers is enabled when a use of the double field is found, whereas it should probably only happen if a *live* use of the double field is found. // Compile with g++ -O3 -m32 // -O[123] all exhibit the problem. #include <assert.h> #include <stdlib.h> #include <stdio.h> union MyClass { MyClass() : DoubleVal(0.0) {} // this ctor causes failure //MyClass() : IntVal(0) {} // this ctor works correctly int64_t IntVal; double DoubleVal; }; __attribute__((noinline)) void create(MyClass &dest, int64_t Val) { MyClass MCOp; MCOp.IntVal = Val; dest = MCOp; } int main(int argc, char **argv) { MyClass copy; int64_t magic = 0xfff0000000000001; // happens to be sNaN create(copy, magic); // copy is changed to qNaN printf("magic=%llx\ncopy= %llx\n", magic, copy.IntVal); assert(magic == copy.IntVal); return 0; } $ g++ -v -save-temps -Wall -Wextra -fno-strict-aliasing -fwrapv -O3 -m32 nantest1.cpp Using built-in specs. COLLECT_GCC=/usr/bin/g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686' /usr/lib/gcc/x86_64-linux-gnu/4.6/cc1plus -E -quiet -v -imultilib 32 -imultiarch i386-linux-gnu -D_GNU_SOURCE nantest1.cpp -m32 -mtune=generic -march=i686 -Wall -Wextra -fno-strict-aliasing -fwrapv -O3 -fpch-preprocess -fstack-protector -o nantest1.ii ignoring nonexistent directory "/usr/local/include/i386-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../x86_64-linux-gnu/include" ignoring nonexistent directory "/usr/include/i386-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/4.6 /usr/include/c++/4.6/x86_64-linux-gnu/32 /usr/include/c++/4.6/backward /usr/lib/gcc/x86_64-linux-gnu/4.6/include /usr/local/include /usr/lib/gcc/x86_64-linux-gnu/4.6/include-fixed /usr/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686' /usr/lib/gcc/x86_64-linux-gnu/4.6/cc1plus -fpreprocessed nantest1.ii -quiet -dumpbase nantest1.cpp -m32 -mtune=generic -march=i686 -auxbase nantest1 -O3 -Wall -Wextra -version -fno-strict-aliasing -fwrapv -fstack-protector -o nantest1.s GNU C++ (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (x86_64-linux-gnu) compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C++ (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (x86_64-linux-gnu) compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 65b5171ac1bd7b3f07dbea6bdb24be3d nantest1.cpp:21:5: warning: unused parameter ‘argc’ [-Wunused-parameter] nantest1.cpp:21:5: warning: unused parameter ‘argv’ [-Wunused-parameter] COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686' as --32 -o nantest1.o nantest1.s COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/ LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/4.6/32/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/:/lib/i386-linux-gnu/:/lib/../lib32/:/usr/lib/i386-linux-gnu/:/usr/lib/../lib32/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../:/lib/i386-linux-gnu/:/lib/:/usr/lib/i386-linux-gnu/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686' /usr/lib/gcc/x86_64-linux-gnu/4.6/collect2 --sysroot=/ --build-id --no-add-needed --as-needed --eh-frame-hdr -m elf_i386 --hash-style=gnu -dynamic-linker /lib/ld-linux.so.2 -z relro /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.6/32/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.6/32 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32 -L/lib/i386-linux-gnu -L/lib/../lib32 -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib32 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../.. -L/lib/i386-linux-gnu -L/usr/lib/i386-linux-gnu nantest1.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/4.6/32/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/crtn.o
This may be related to PR57484.
It is SRA that transforms MCOp.DoubleVal = 0.0; MCOp.IntVal = Val_3(D); *dest_5(D) = MCOp; into MCOp$DoubleVal_7 = 0.0; _8 = VIEW_CONVERT_EXPR<double>(Val_3(D)); MCOp$DoubleVal_4 = _8; MEM[(union MyClass *)dest_5(D)] = MCOp$DoubleVal_4; simplifying into _6 = VIEW_CONVERT_EXPR<double>(Val_3(D)); MEM[(union MyClass *)dest_4(D)] = _6; and (insn 7 6 0 (set (mem:DF (reg/v/f:SI 60 [ dest ]) [0 MEM[(union MyClass *)dest_4(D)]+0 S8 A32]) (subreg:DF (reg/v:DI 61 [ Val ]) 0)) t.C:15 -1 (nil)) causes a reload: (insn 7 11 12 2 (set (reg:DF 8 st [62]) (mem/c:DF (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [0 Val+0 S8 A32])) t.C:15 134 {*movdf_internal} (nil)) (insn 12 7 0 2 (set (mem:DF (reg/v/f:SI 0 ax [orig:60 dest ] [60]) [0 MEM[(union MyClass *)dest_4(D)]+0 S8 A32]) (reg:DF 8 st [62])) t.C:15 134 {*movdf_internal} (expr_list:REG_DEAD (reg:DF 8 st [62]) (expr_list:REG_DEAD (reg/v/f:SI 0 ax [orig:60 dest ] [60]) (nil)))) where *movdf_internal simply doesn't properly preserve sNaN. It looks wrong for DFmode move instructions to not preserve a IEEE defined flag. I suppose it would be also wrong to not preserve the exact bit pattern (think of canonicalizing NaNs).
(In reply to Richard Biener from comment #2) > where *movdf_internal simply doesn't properly preserve sNaN. > > It looks wrong for DFmode move instructions to not preserve a IEEE defined > flag. I suppose it would be also wrong to not preserve the exact bit pattern > (think of canonicalizing NaNs). As said in PR57484, comment 11: On an x86 target using the legacy x87 instructions and the 80-bit registers, a load of a 64-bit or 32-bit value in memory into the 80-bit registers counts as a format conversion and an signaling NaN input will turn into a quiet NaN in the register format.
On Mon, 16 Sep 2013, rguenth at gcc dot gnu.org wrote: > It looks wrong for DFmode move instructions to not preserve a IEEE defined > flag. I suppose it would be also wrong to not preserve the exact bit pattern > (think of canonicalizing NaNs). I think of the move problem for signaling NaNs as being bug 56831 - except that as expressed there it's only a bug for -fsignaling-nans, whereas the present bug is clearly a bug with or without -fsignaling-nans (the program makes no explicit use of signaling NaNs at all). In the absence of -fexcess-precision=standard, I suppose the move patterns do need to represent how to move DFmove values into x87 registers from memory (despite the lack of proper bit-pattern-preserving loads on the processor), but any move instruction that extends SFmode or DFmode implicitly to XFmode should be avoided except where actually needed for floating-point arithmetic to be carried out or because the ABI requires the value in an x87 register.
Not sure if it's still the same bug but the pattern from this PR leads to problems with long double on x86-64. The optimization looses data in bytes corresponding to padding in long double. Transformation from a string to a long double is wrong too, filed separately -- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71475. But here this example is only to illustrate that the padding is lost. Source code: ---------------------------------------------------------------------- #include <stdio.h> struct s { char s[sizeof(long double)]; }; union u { long double d; struct s s; }; int main() { union u x = {0}; x.s = (struct s){"xxxxxxxxxxxxxxxx"}; union u y = x; // start from the big end for (unsigned char *p = (unsigned char *)&y + sizeof y; p-- > (unsigned char *)&y;) printf("%02x ", *p); puts(""); } ---------------------------------------------------------------------- Results: ---------------------------------------------------------------------- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 00 00 00 00 00 40 00 00 78 78 78 78 78 78 78 78 ---------------------------------------------------------------------- gcc version: gcc (GCC) 7.0.0 20160608 (experimental)
Reconfirmed with the comment#5 testcase. ESRA does int main () { + long double x$d; unsigned char * p; union u y; union u x; @@ -25,9 +20,9 @@ int _2; <bb 2> : - x.d = 0.0; - x.s.s = "xxxxxxxxxxxxxxxx"; - y = x; + x$d_3 = 0.0; + x$d_16 = MEM[(char[16] *)"xxxxxxxxxxxxxxxx"]; + MEM[(union u *)&y] = x$d_16; and the value re-interpretation goes off. Martin - we may have dups of this but SRA shouldn't do this (it possibly gets confused by the x.d store)?
Access trees for x (UID: 2479): access { base = (2479)'x', offset = 0, size = 128, expr = x.d, type = long double, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read = 1, grp_assignment_write = 1, grp_scalar_read = 0, grp_scalar_write = 1, grp_total_scalarization = 1, grp_hint = 0, grp_covered = 1, grp_unscalarizable_region = 0, grp_unscalarized_data = 0, grp_same_access_path = 0, grp_partial_lhs = 0, grp_to_be_replaced = 1, grp_to_be_debug_replaced = 0} so it's odd to not see x.s.s access here. I guess we merge them and somehow the FP type "prevails" over the array type? IMO when merging "bad" types we have to pun to a bitwise type.
(In reply to Richard Biener from comment #6); > > and the value re-interpretation goes off. Martin - we may have dups of this > but SRA shouldn't do this (it possibly gets confused by the x.d store)? Yes I think pr 93271 is a dup of this.
Hello, this is apparently affecting Emacs build as well [1]. I there a suggested way we could work around this bug? Maybe a flag to prevent the optimization which we could use on affected systems? Thanks Andrea [1] <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72145>
(In reply to akrl from comment #9) > Hello, > > this is apparently affecting Emacs build as well [1]. > I there a suggested way we could work around this bug? Maybe a flag to > prevent the optimization which we could use on affected systems? > > Thanks > > Andrea > > [1] <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72145> The remaining issue is analyzed to be caused by SRA so you can check whether -fno-tree-sra fixes it for you.
(In reply to Richard Biener from comment #10) > The remaining issue is analyzed to be caused by SRA so you can check whether > -fno-tree-sra fixes it for you. Thanks, it does, and I modified the Emacs 'configure' script here: https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=9f4fc6608212191e1a9e07bf89f38ba9e4ea786c so that, when using GCC 4 or later on x86, Emacs 'configure' appends to CFLAGS either the option -mfpmath=sse (if the target is known to support SSE2, e.g., because -msse2 is also given) or the option -fno-tree-sra (if not). It strikes me that GCC could do this itself, as a crude workaround for all programs. That is, when compiling for x86 and -mfpmath=sse is not in effect, GCC could automatically disable tree-sra for the compilation unit. Or it could be more selective and disable tree-sra only if the compilation unit defines a union that could cause the trouble. Although such a patch would hurt the performance of the generated code, that's better than generating wrong code that works only 99.9% of the time.
I have two variants of a fix, one that for the testcase in comment#5 avoids scalarization on i?86-*-* and one that scalarizes into unsigned:96 which exposes that we fail to constant fold MEM<unsigned:96>["xxxxxxxxxxxxxxxx"]; On x86-64 with sizeof(long double) == 16 (instead of == 12 with -m32) we scalarize with uint128_t. I'm going to test the variant that avoids creating an unsigned:96 integer type since I think that's safer.
Created attachment 58718 [details] patch I am testing Paul - can you test if this patch resolves the emacs issue?
(In reply to Richard Biener from comment #13) > Created attachment 58718 [details] > patch I am testing > > Paul - can you test if this patch resolves the emacs issue? Using root->grp_same_access_path doesn't seem to be a perfect fit. For gcc.dg/tree-ssa/sra-6.c we're now doing Changing the type of a replacement for b offset: 0, size: 64 to an integer. But maybe that's OK or we need a different flag to tell whether the access paths are the same or just the tail of the access path, so 's' and 's.a' would be OK for a root 's.a.f', allowing aggregate (sub-)copies but that would assume we copy field-wise which we don't do - we RTL expand to (inlined) memcpy which is also not semantically the same for x87 float fields (but would it have to be?)
Created attachment 58724 [details] simple (wip) fix I'm wondering whether just simply something like this would not be enough. I have looked at total scalarization and we will not replace a type found in the IL with another one there. Similarly, only propagation through assignments fiddles with existing types (when they are not aggregate) only when propagating from RHS to LHS and not the other way round. If we want to be more aggressive, we can add a flag when the new predicate fails but there is a good bitwise_type_for_mode and then when the flag is set, use that type instead in analyze_access_subtree. Note that so far I have only tested the attached patch with make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c" make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c" make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c" I'll have a look at full test results tomorrow morning.
(In reply to Richard Biener from comment #13) > Paul - can you test if this patch resolves the emacs issue? Unfortunately not. Although the generated code differs, it's still the same bad pattern. GDB's command 'disas decode_lisp_time' reports this: ... 0x082a924c <+876>: call 0x82a8140 <decode_ticks_hz> => 0x082a9251 <+881>: fldl 0x8c(%esp) 0x082a9258 <+888>: add $0x10,%esp 0x082a925b <+891>: fstpl 0x7c(%esp) ... where the 8-byte quantity being copied comes from the leading bytes of this union: union c_time { struct ticks_hz { long long ticks; long long hz; } th; struct timespec ts; double d; }; and this union happens to contain th (of type struct ticks_hz) not d (of type double).
(In reply to Paul Eggert from comment #16) > (In reply to Richard Biener from comment #13) > > Paul - can you test if this patch resolves the emacs issue? > > Unfortunately not. Although the generated code differs, it's still the same > bad pattern. GDB's command 'disas decode_lisp_time' reports this: > > ... > 0x082a924c <+876>: call 0x82a8140 <decode_ticks_hz> > > => 0x082a9251 <+881>: fldl 0x8c(%esp) > > 0x082a9258 <+888>: add $0x10,%esp > > 0x082a925b <+891>: fstpl 0x7c(%esp) > ... > > where the 8-byte quantity being copied comes from the leading bytes of this > union: > > union c_time > { > struct ticks_hz { long long ticks; long long hz; } th; > struct timespec ts; > double d; > }; > > and this union happens to contain th (of type struct ticks_hz) not d (of > type double). I suppose this happens even when building without LTO, so can you please attach preprocessed source of the TU containing decode_lisp_time (preprocessed with all the flags used when the issue appears)? Maybe it's really a duplicate of one of the related bugs (my patch didn't fix all of them).
(In reply to Martin Jambor from comment #15) > Created attachment 58724 [details] > simple (wip) fix > > I'm wondering whether just simply something like this would not be enough. > I have looked at total scalarization and we will not replace a type found in > the IL with another one there. Similarly, only propagation through > assignments fiddles with existing types (when they are not aggregate) only > when propagating from RHS to LHS and not the other way round. > > If we want to be more aggressive, we can add a flag when the new predicate > fails but there is a good bitwise_type_for_mode and then when the flag is > set, use that type instead in analyze_access_subtree. > > Note that so far I have only tested the attached patch with > make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c" > make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c" > make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c" > > I'll have a look at full test results tomorrow morning. I think it should be OK to fix the wrong-code issue in this bug but it prevents scalarization completely, likely at least failing gcc.dg/tree-ssa/sra-6.c Note the name of the predicate should probably reflect the problem, like can_use_type_as_storage_for (tree storage_type, tree value_type) or so. With my patch I trieds to instead use a bit-pattern preserving load similar as what we do with non-mode precision integer prevailing types. Note I plan to add a new target hook to identify problematic FP modes.
(In reply to Richard Biener from comment #18) > (In reply to Martin Jambor from comment #15) > > Created attachment 58724 [details] > > simple (wip) fix > > > > I'm wondering whether just simply something like this would not be enough. > > I have looked at total scalarization and we will not replace a type found in > > the IL with another one there. Similarly, only propagation through > > assignments fiddles with existing types (when they are not aggregate) only > > when propagating from RHS to LHS and not the other way round. > > > > If we want to be more aggressive, we can add a flag when the new predicate > > fails but there is a good bitwise_type_for_mode and then when the flag is > > set, use that type instead in analyze_access_subtree. > > > > Note that so far I have only tested the attached patch with > > make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c" > > make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c" > > make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c" > > > > I'll have a look at full test results tomorrow morning. > > I think it should be OK to fix the wrong-code issue in this bug but it > prevents scalarization completely, likely at least failing > gcc.dg/tree-ssa/sra-6.c I did check that even yesterday night and it passes but g++.dg/vect/pr64410.cc and gcc.dg/tree-ssa/pr32964.c fail. I'm investigating. > > Note the name of the predicate should probably reflect the problem, > like can_use_type_as_storage_for (tree storage_type, tree value_type) > or so. Yes, I'm aware it is less than ideal so far. I was not sure what the final version would be like. THe version from yesterday unnecessarily pessimizes cases where the "other" type is a structure containing just the float or one element array, or a combination, that at least should be addresses. > > With my patch I trieds to instead use a bit-pattern preserving load > similar as what we do with non-mode precision integer prevailing types. > > Note I plan to add a new target hook to identify problematic FP modes. That would be super-helpful.
Created attachment 58739 [details] preprocessed Emacs source code illustrating the bug Here is the gzip-compressed text of the Emacs source code illustrating the bug. I generated it via the following commands on an AMD Phenom II X4 910e. gcc -m32 -march=native -E -Demacs -I. -I. -I../lib -I../lib -isystem /usr/include/libpng16 -DWITH_GZFILEOP -isystem /usr/include/libxml2 -DWITH_GZFILEOP -isystem /usr/include/webp -MMD -MF deps/timefns.d -MP -Werror -fanalyzer -fstrict-flex-arrays -Wall -Warith-conversion -Wdate-time -Wdisabled-optimization -Wdouble-promotion -Wduplicated-cond -Wextra -Wformat-signedness -Wflex-array-member-not-at-end -Winit-self -Winvalid-pch -Wlogical-op -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wmissing-variable-declarations -Wnested-externs -Wnull-dereference -Wold-style-definition -Wopenmp-simd -Wpacked -Wpointer-arith -Wstrict-flex-arrays -Wstrict-prototypes -Wsuggest-attribute=format -Wsuggest-attribute=malloc -Wsuggest-attribute=noreturn -Wsuggest-final-methods -Wsuggest-final-types -Wtrampolines -Wuninitialized -Wunknown-pragmas -Wunused-macros -Wvariadic-macros -Wvector-operation-performance -Wwrite-strings -Warray-bounds=2 -Wattribute-alias=2 -Wformat=2 -Wformat-truncation=2 -Wimplicit-fallthrough=5 -Wshift-overflow=2 -Wuse-after-free=3 -Wvla-larger-than=4031 -Wno-analyzer-malloc-leak -Wredundant-decls -Wno-missing-field-initializers -Wno-override-init -Wno-sign-compare -Wno-type-limits -Wno-unused-parameter -Wno-format-nonliteral -Wno-bidi-chars -Wno-analyzer-fd-leak -g3 -O2 timefns.c >timefns.i gzip -9n timefns.i
Created attachment 58740 [details] Assembly-language illustrating wrong code Here is the assembly language output (gzip compressed) of the wrong code. The wrong code starts at line 3025, where fldl/fstpl is incorrectly used to copy a 64-bit int. I used the same gcc command as before, except with -S instead of -E, and I omitted to save space in the .s file.
(In reply to Richard Biener from comment #17) > I suppose this happens even when building without LTO, so can you please > attach preprocessed source of the TU containing decode_lisp_time > (preprocessed with all the flags used when the issue appears)? Maybe > it's really a duplicate of one of the related bugs (my patch didn't fix > all of them). Thanks, I've attached the preprocessed source and wrong code in my previous two comments. You're correct that it happens without LTO.
So it's from return (struct form_time) { .form = TIMEFORM_HI_LO, .time = decode_ticks_hz (specified_time, make_fixnum (1), cform) }; with union c_time { struct ticks_hz th; struct timespec ts; double d; }; struct form_time { enum timeform form; union c_time time; }; since there's also .form = TIMEFORM_FLOAT, .time = decode_float_time (d, cform) and that one is inlined we're likely only seeing an access to the double union component and otherwise aggregate union accesses. Still needs a reduction.
Created attachment 58752 [details] Another wip patch To give sort-of a status update, this is the current state of my WIP fix. It clearly needs more thinking about the reverse storage cases and still fails g++.dg/vect/pr64410.cc - but I'm afraid that would require a target hook to identify problematic FP modes. Otherwise it passes bootstrap and testsuite on x86_64-linux.
I have proposed a patch on the mailing list: https://inbox.sourceware.org/gcc-patches/ri6ed6kntue.fsf@virgil.suse.cz/T/#u
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>: https://gcc.gnu.org/g:f577959f420ae404f99f630dadc1c0370734d0da commit r15-3070-gf577959f420ae404f99f630dadc1c0370734d0da Author: Martin Jambor <mjambor@suse.cz> Date: Wed Aug 21 14:49:11 2024 +0200 sra: Avoid risking x87 magling binary representation of a replacement (PR 58416) PR 58416 shows that storing non-floating point data to floating point scalar registers can lead to miscompilations when the data is normalized or otherwise processed upon loading to a register. To avoid that risk, this patch detects situations where we have multiple types and a we decide to represent the data in a type with a mode that is known to not be able to transfer actual bits reliably using the new TARGET_MODE_CAN_TRANSFER_BITS hook. gcc/ChangeLog: 2024-08-19 Martin Jambor <mjambor@suse.cz> PR target/58416 * tree-sra.cc (types_risk_mangled_binary_repr_p): New function. (sort_and_splice_var_accesses): Use it. (propagate_subaccesses_from_rhs): Likewise. gcc/testsuite/ChangeLog: 2024-08-19 Martin Jambor <mjambor@suse.cz> PR target/58416 * gcc.dg/torture/pr58416.c: New test.
Fixed on trunk.
Thanks for the fix. I updated Emacs to no longer work around the bug when GCC 15+ is being used, here: https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=3d1d4f109ed4115256a08c74eeb704259d91c9f4
It wasn't clear to me if your case was fixed (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416#c23) or if it needed reduction.