58416 – Incorrect x87-based union copying code

Bug 58416 - Incorrect x87-based union copying code

Summary: Incorrect x87-based union copying code

Status:	ASSIGNED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.6.3

Importance:	P3 normal
Target Milestone:	---
Assignee:	Richard Biener

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2013-09-13 19:56 UTC by Jim Stichnoth
Modified:	2024-11-23 03:18 UTC (History)
CC List:	7 users (show)

See Also:	71460 57484 56831 71475 93271 114659 https://debbugs.gnu.org/72145 https://github.com/llvm/llvm-project/issues/44497
Host:
Target:	i?86--
Build:
Known to work:	15.0
Known to fail:
Last reconfirmed:	2024-02-20 00:00:00

Attachments
.ii file generated from nantest1.cpp (9.66 KB, text/plain) 2013-09-13 19:56 UTC, Jim Stichnoth	Details
patch I am testing (2.29 KB, patch) 2024-07-21 08:58 UTC, Richard Biener	Details \| Diff
simple (wip) fix (946 bytes, patch) 2024-07-22 21:09 UTC, Martin Jambor	Details \| Diff
preprocessed Emacs source code illustrating the bug (233.58 KB, application/gzip) 2024-07-23 17:42 UTC, Paul Eggert	Details
Assembly-language illustrating wrong code (21.89 KB, application/gzip) 2024-07-23 17:46 UTC, Paul Eggert	Details
Another wip patch (2.61 KB, application/mbox) 2024-07-24 19:33 UTC, Martin Jambor	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jim Stichnoth 2013-09-13 19:56:12 UTC

Created attachment 30819 [details]
.ii file generated from nantest1.cpp

The program below produces incorrect code on x86-32 with gcc 4.6.3.  (It was also found to fail in 4.7 but succeed in 4.8.)

The problem happens when:
1. The union is initialized via its 64-bit double field
2. The union is then updated via its 64-bit integer field
3. The union is copied to another variable

When this happens, the 64-bit copy in step 3 is done via the x87 fldl/fstpl instructions.  If the 64-bit integer value in step 2 happens to be an sNaN pattern, the fldl instruction transforms it to a qNaN pattern and the fstpl instruction writes the wrong value.

If step 1 is omitted, the x87 instructions are not used and the code is correct.  This suggests that the optimization of using x87 for 64-bit transfers is enabled when a use of the double field is found, whereas it should probably only happen if a *live* use of the double field is found.


// Compile with g++ -O3 -m32
// -O[123] all exhibit the problem.

#include <assert.h>
#include <stdlib.h>
#include <stdio.h>

union MyClass {
  MyClass() : DoubleVal(0.0) {} // this ctor causes failure
  //MyClass() : IntVal(0) {} // this ctor works correctly
  int64_t IntVal;
  double DoubleVal;
};

__attribute__((noinline)) void create(MyClass &dest, int64_t Val) {
  MyClass MCOp;
  MCOp.IntVal = Val;
  dest = MCOp;
}

int main(int argc, char **argv) {
  MyClass copy;

  int64_t magic = 0xfff0000000000001; // happens to be sNaN
  create(copy, magic); // copy is changed to qNaN

  printf("magic=%llx\ncopy= %llx\n", magic, copy.IntVal);
  assert(magic == copy.IntVal);

  return 0;
}


$ g++ -v -save-temps -Wall -Wextra -fno-strict-aliasing -fwrapv -O3 -m32 nantest1.cpp 
Using built-in specs.
COLLECT_GCC=/usr/bin/g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686'
 /usr/lib/gcc/x86_64-linux-gnu/4.6/cc1plus -E -quiet -v -imultilib 32 -imultiarch i386-linux-gnu -D_GNU_SOURCE nantest1.cpp -m32 -mtune=generic -march=i686 -Wall -Wextra -fno-strict-aliasing -fwrapv -O3 -fpch-preprocess -fstack-protector -o nantest1.ii
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../x86_64-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i386-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/4.6
 /usr/include/c++/4.6/x86_64-linux-gnu/32
 /usr/include/c++/4.6/backward
 /usr/lib/gcc/x86_64-linux-gnu/4.6/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.6/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686'
 /usr/lib/gcc/x86_64-linux-gnu/4.6/cc1plus -fpreprocessed nantest1.ii -quiet -dumpbase nantest1.cpp -m32 -mtune=generic -march=i686 -auxbase nantest1 -O3 -Wall -Wextra -version -fno-strict-aliasing -fwrapv -fstack-protector -o nantest1.s
GNU C++ (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (x86_64-linux-gnu)
	compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C++ (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (x86_64-linux-gnu)
	compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 65b5171ac1bd7b3f07dbea6bdb24be3d
nantest1.cpp:21:5: warning: unused parameter ‘argc’ [-Wunused-parameter]
nantest1.cpp:21:5: warning: unused parameter ‘argv’ [-Wunused-parameter]
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686'
 as --32 -o nantest1.o nantest1.s
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/4.6/32/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/:/lib/i386-linux-gnu/:/lib/../lib32/:/usr/lib/i386-linux-gnu/:/usr/lib/../lib32/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../:/lib/i386-linux-gnu/:/lib/:/usr/lib/i386-linux-gnu/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-O3' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686'
 /usr/lib/gcc/x86_64-linux-gnu/4.6/collect2 --sysroot=/ --build-id --no-add-needed --as-needed --eh-frame-hdr -m elf_i386 --hash-style=gnu -dynamic-linker /lib/ld-linux.so.2 -z relro /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.6/32/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.6/32 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32 -L/lib/i386-linux-gnu -L/lib/../lib32 -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib32 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../i386-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../.. -L/lib/i386-linux-gnu -L/usr/lib/i386-linux-gnu nantest1.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/4.6/32/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib32/crtn.o

Comment 1 H.J. Lu 2013-09-13 20:23:48 UTC

This may be related to PR57484.

Comment 2 Richard Biener 2013-09-16 10:28:34 UTC

It is SRA that transforms

  MCOp.DoubleVal = 0.0;
  MCOp.IntVal = Val_3(D);
  *dest_5(D) = MCOp;

into

  MCOp$DoubleVal_7 = 0.0;
  _8 = VIEW_CONVERT_EXPR<double>(Val_3(D));
  MCOp$DoubleVal_4 = _8;
  MEM[(union MyClass *)dest_5(D)] = MCOp$DoubleVal_4;

simplifying into

  _6 = VIEW_CONVERT_EXPR<double>(Val_3(D));
  MEM[(union MyClass *)dest_4(D)] = _6;

and

(insn 7 6 0 (set (mem:DF (reg/v/f:SI 60 [ dest ]) [0 MEM[(union MyClass *)dest_4(D)]+0 S8 A32])
        (subreg:DF (reg/v:DI 61 [ Val ]) 0)) t.C:15 -1
     (nil))

causes a reload:

(insn 7 11 12 2 (set (reg:DF 8 st [62])
        (mem/c:DF (plus:SI (reg/f:SI 7 sp)
                (const_int 8 [0x8])) [0 Val+0 S8 A32])) t.C:15 134 {*movdf_internal}
     (nil))
(insn 12 7 0 2 (set (mem:DF (reg/v/f:SI 0 ax [orig:60 dest ] [60]) [0 MEM[(union MyClass *)dest_4(D)]+0 S8 A32])
        (reg:DF 8 st [62])) t.C:15 134 {*movdf_internal}
     (expr_list:REG_DEAD (reg:DF 8 st [62])
        (expr_list:REG_DEAD (reg/v/f:SI 0 ax [orig:60 dest ] [60])
            (nil))))

where *movdf_internal simply doesn't properly preserve sNaN.

It looks wrong for DFmode move instructions to not preserve a IEEE defined
flag.  I suppose it would be also wrong to not preserve the exact bit pattern
(think of canonicalizing NaNs).

Comment 3 Uroš Bizjak 2013-09-16 13:21:26 UTC

(In reply to Richard Biener from comment #2)

> where *movdf_internal simply doesn't properly preserve sNaN.
> 
> It looks wrong for DFmode move instructions to not preserve a IEEE defined
> flag.  I suppose it would be also wrong to not preserve the exact bit pattern
> (think of canonicalizing NaNs).

As said in PR57484, comment 11:

On an x86 target using the legacy x87 instructions and the 80-bit registers, a load of a 64-bit or 32-bit value in memory into the 80-bit registers counts as a format conversion and an signaling NaN input will turn into a quiet NaN in the register format.

Comment 4 jsm-csl@polyomino.org.uk 2013-09-16 15:49:32 UTC

On Mon, 16 Sep 2013, rguenth at gcc dot gnu.org wrote:

> It looks wrong for DFmode move instructions to not preserve a IEEE defined
> flag.  I suppose it would be also wrong to not preserve the exact bit pattern
> (think of canonicalizing NaNs).

I think of the move problem for signaling NaNs as being bug 56831 - except 
that as expressed there it's only a bug for -fsignaling-nans, whereas the 
present bug is clearly a bug with or without -fsignaling-nans (the program 
makes no explicit use of signaling NaNs at all).

In the absence of -fexcess-precision=standard, I suppose the move patterns 
do need to represent how to move DFmove values into x87 registers from 
memory (despite the lack of proper bit-pattern-preserving loads on the 
processor), but any move instruction that extends SFmode or DFmode 
implicitly to XFmode should be avoided except where actually needed for 
floating-point arithmetic to be carried out or because the ABI requires 
the value in an x87 register.

Comment 5 Alexander Cherepanov 2016-06-09 15:01:12 UTC

Not sure if it's still the same bug but the pattern from this PR leads to problems with long double on x86-64. The optimization looses data in bytes corresponding to padding in long double.

Transformation from a string to a long double is wrong too, filed separately -- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71475. But here this example is only to illustrate that the padding is lost.

Source code:

----------------------------------------------------------------------
#include <stdio.h>

struct s {
  char s[sizeof(long double)];
};

union u {
  long double d;
  struct s s;
};

int main()
{
  union u x = {0};
  x.s = (struct s){"xxxxxxxxxxxxxxxx"};

  union u y = x;

  // start from the big end
  for (unsigned char *p = (unsigned char *)&y + sizeof y; p-- > (unsigned char *)&y;)
    printf("%02x ", *p);
  puts("");
}
----------------------------------------------------------------------

Results:

----------------------------------------------------------------------
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
00 00 00 00 00 40 00 00 78 78 78 78 78 78 78 78 
----------------------------------------------------------------------

gcc version: gcc (GCC) 7.0.0 20160608 (experimental)

Comment 6 Richard Biener 2024-02-20 10:05:06 UTC

Reconfirmed with the comment#5 testcase.  ESRA does

 int main ()
 {
+  long double x$d;
   unsigned char * p;
   union u y;
   union u x;
@@ -25,9 +20,9 @@
   int _2;
 
   <bb 2> :
-  x.d = 0.0;
-  x.s.s = "xxxxxxxxxxxxxxxx";
-  y = x;
+  x$d_3 = 0.0;
+  x$d_16 = MEM[(char[16] *)"xxxxxxxxxxxxxxxx"];
+  MEM[(union u *)&y] = x$d_16;

and the value re-interpretation goes off.  Martin - we may have dups of this
but SRA shouldn't do this (it possibly gets confused by the x.d store)?

Comment 7 Richard Biener 2024-02-20 10:07:29 UTC

Access trees for x (UID: 2479):
access { base = (2479)'x', offset = 0, size = 128, expr = x.d, type = long double, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read = 1, grp_assignment_write = 1, grp_scalar_read = 0, grp_scalar_write = 1, grp_total_scalarization = 1, grp_hint = 0, grp_covered = 1, grp_unscalarizable_region = 0, grp_unscalarized_data = 0, grp_same_access_path = 0, grp_partial_lhs = 0, grp_to_be_replaced = 1, grp_to_be_debug_replaced = 0}

so it's odd to not see x.s.s access here.  I guess we merge them and somehow
the FP type "prevails" over the array type?  IMO when merging "bad" types
we have to pun to a bitwise type.

Comment 8 Andrew Pinski 2024-02-20 10:12:39 UTC

(In reply to Richard Biener from comment #6);
> 
> and the value re-interpretation goes off.  Martin - we may have dups of this
> but SRA shouldn't do this (it possibly gets confused by the x.d store)?


Yes I think pr 93271 is a dup of this.

Comment 9 akrl 2024-07-18 14:16:39 UTC

Hello,

this is apparently affecting Emacs build as well [1].
I there a suggested way we could work around this bug? Maybe a flag to prevent the optimization which we could use on affected systems?

Thanks

  Andrea

[1] <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72145>

Comment 10 Richard Biener 2024-07-19 12:34:23 UTC

(In reply to akrl from comment #9)
> Hello,
> 
> this is apparently affecting Emacs build as well [1].
> I there a suggested way we could work around this bug? Maybe a flag to
> prevent the optimization which we could use on affected systems?
> 
> Thanks
> 
>   Andrea
> 
> [1] <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72145>

The remaining issue is analyzed to be caused by SRA so you can check whether
-fno-tree-sra fixes it for you.

Comment 11 Paul Eggert 2024-07-20 14:30:44 UTC

(In reply to Richard Biener from comment #10)
> The remaining issue is analyzed to be caused by SRA so you can check whether
> -fno-tree-sra fixes it for you.

Thanks, it does, and I modified the Emacs 'configure' script here:

https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=9f4fc6608212191e1a9e07bf89f38ba9e4ea786c

so that, when using GCC 4 or later on x86, Emacs 'configure' appends to CFLAGS either the option -mfpmath=sse (if the target is known to support SSE2, e.g., because -msse2 is also given) or the option -fno-tree-sra (if not).

It strikes me that GCC could do this itself, as a crude workaround for all programs. That is, when compiling for x86 and -mfpmath=sse is not in effect, GCC could automatically disable tree-sra for the compilation unit. Or it could be more selective and disable tree-sra only if the compilation unit defines a union that could cause the trouble.

Although such a patch would hurt the performance of the generated code, that's better than generating wrong code that works only 99.9% of the time.

Comment 12 Richard Biener 2024-07-21 08:58:21 UTC

I have two variants of a fix, one that for the testcase in comment#5 avoids
scalarization on i?86-*-* and one that scalarizes into unsigned:96 which
exposes that we fail to constant fold MEM<unsigned:96>["xxxxxxxxxxxxxxxx"];

On x86-64 with sizeof(long double) == 16 (instead of == 12 with -m32) we
scalarize with uint128_t.

I'm going to test the variant that avoids creating an unsigned:96 integer type
since I think that's safer.

Comment 13 Richard Biener 2024-07-21 08:58:56 UTC

Created attachment 58718 [details]
patch I am testing

Paul - can you test if this patch resolves the emacs issue?

Comment 14 Richard Biener 2024-07-21 10:21:23 UTC

(In reply to Richard Biener from comment #13)
> Created attachment 58718 [details]
> patch I am testing
> 
> Paul - can you test if this patch resolves the emacs issue?

Using root->grp_same_access_path doesn't seem to be a perfect fit.  For
gcc.dg/tree-ssa/sra-6.c we're now doing

Changing the type of a replacement for b offset: 0, size: 64  to an integer.

But maybe that's OK or we need a different flag to tell whether the access
paths are the same or just the tail of the access path, so 's' and 's.a'
would be OK for a root 's.a.f', allowing aggregate (sub-)copies but that
would assume we copy field-wise which we don't do - we RTL expand to
(inlined) memcpy which is also not semantically the same for x87 float
fields (but would it have to be?)

Comment 15 Martin Jambor 2024-07-22 21:09:42 UTC

Created attachment 58724 [details]
simple (wip) fix

I'm wondering whether just simply something like this would not be enough.  I have looked at total scalarization and we will not replace a type found in the IL with another one there. Similarly, only propagation through assignments fiddles with existing types (when they are not aggregate) only when propagating from RHS to LHS and not the other way round.

If we want to be more aggressive, we can add a flag when the new predicate fails but there is a good bitwise_type_for_mode and then when the flag is set, use that type instead in analyze_access_subtree.

Note that so far I have only tested the attached patch with
  make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c"
  make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c"
  make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c"

I'll have a look at full test results tomorrow morning.

Comment 16 Paul Eggert 2024-07-23 01:47:14 UTC

(In reply to Richard Biener from comment #13)
> Paul - can you test if this patch resolves the emacs issue?

Unfortunately not. Although the generated code differs, it's still the same bad pattern. GDB's command 'disas decode_lisp_time' reports this:

     ...
     0x082a924c <+876>:   call   0x82a8140 <decode_ticks_hz>                      
  => 0x082a9251 <+881>:   fldl   0x8c(%esp)                                       
     0x082a9258 <+888>:   add    $0x10,%esp                                       
     0x082a925b <+891>:   fstpl  0x7c(%esp)
     ...                                       

where the 8-byte quantity being copied comes from the leading bytes of this union:

  union c_time
  {
    struct ticks_hz { long long ticks; long long hz; } th;
    struct timespec ts;
    double d;
  };

and this union happens to contain th (of type struct ticks_hz) not d (of type double).

Comment 17 Richard Biener 2024-07-23 06:23:29 UTC

(In reply to Paul Eggert from comment #16)
> (In reply to Richard Biener from comment #13)
> > Paul - can you test if this patch resolves the emacs issue?
> 
> Unfortunately not. Although the generated code differs, it's still the same
> bad pattern. GDB's command 'disas decode_lisp_time' reports this:
> 
>      ...
>      0x082a924c <+876>:   call   0x82a8140 <decode_ticks_hz>                
> 
>   => 0x082a9251 <+881>:   fldl   0x8c(%esp)                                 
> 
>      0x082a9258 <+888>:   add    $0x10,%esp                                 
> 
>      0x082a925b <+891>:   fstpl  0x7c(%esp)
>      ...                                       
> 
> where the 8-byte quantity being copied comes from the leading bytes of this
> union:
> 
>   union c_time
>   {
>     struct ticks_hz { long long ticks; long long hz; } th;
>     struct timespec ts;
>     double d;
>   };
> 
> and this union happens to contain th (of type struct ticks_hz) not d (of
> type double).

I suppose this happens even when building without LTO, so can you please
attach preprocessed source of the TU containing decode_lisp_time
(preprocessed with all the flags used when the issue appears)?  Maybe
it's really a duplicate of one of the related bugs (my patch didn't fix
all of them).

Comment 18 Richard Biener 2024-07-23 06:28:25 UTC

(In reply to Martin Jambor from comment #15)
> Created attachment 58724 [details]
> simple (wip) fix
> 
> I'm wondering whether just simply something like this would not be enough. 
> I have looked at total scalarization and we will not replace a type found in
> the IL with another one there. Similarly, only propagation through
> assignments fiddles with existing types (when they are not aggregate) only
> when propagating from RHS to LHS and not the other way round.
> 
> If we want to be more aggressive, we can add a flag when the new predicate
> fails but there is a good bitwise_type_for_mode and then when the flag is
> set, use that type instead in analyze_access_subtree.
> 
> Note that so far I have only tested the attached patch with
>   make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c"
>   make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c"
>   make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c"
> 
> I'll have a look at full test results tomorrow morning.

I think it should be OK to fix the wrong-code issue in this bug but it
prevents scalarization completely, likely at least failing
gcc.dg/tree-ssa/sra-6.c

Note the name of the predicate should probably reflect the problem,
like can_use_type_as_storage_for (tree storage_type, tree value_type)
or so.

With my patch I trieds to instead use a bit-pattern preserving load
similar as what we do with non-mode precision integer prevailing types.

Note I plan to add a new target hook to identify problematic FP modes.

Comment 19 Martin Jambor 2024-07-23 12:55:45 UTC

(In reply to Richard Biener from comment #18)
> (In reply to Martin Jambor from comment #15)
> > Created attachment 58724 [details]
> > simple (wip) fix
> > 
> > I'm wondering whether just simply something like this would not be enough. 
> > I have looked at total scalarization and we will not replace a type found in
> > the IL with another one there. Similarly, only propagation through
> > assignments fiddles with existing types (when they are not aggregate) only
> > when propagating from RHS to LHS and not the other way round.
> > 
> > If we want to be more aggressive, we can add a flag when the new predicate
> > fails but there is a good bitwise_type_for_mode and then when the flag is
> > set, use that type instead in analyze_access_subtree.
> > 
> > Note that so far I have only tested the attached patch with
> >   make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c"
> >   make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c"
> >   make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c"
> > 
> > I'll have a look at full test results tomorrow morning.
> 
> I think it should be OK to fix the wrong-code issue in this bug but it
> prevents scalarization completely, likely at least failing
> gcc.dg/tree-ssa/sra-6.c

I did check that even yesterday night and it passes but g++.dg/vect/pr64410.cc and gcc.dg/tree-ssa/pr32964.c fail.  I'm investigating.

> 
> Note the name of the predicate should probably reflect the problem,
> like can_use_type_as_storage_for (tree storage_type, tree value_type)
> or so.

Yes, I'm aware it is less than ideal so far.  I was not sure what the final version would be like.  THe version from yesterday unnecessarily pessimizes cases where the "other" type is a structure containing just the float or one element array, or a combination, that at least should be addresses.

> 
> With my patch I trieds to instead use a bit-pattern preserving load
> similar as what we do with non-mode precision integer prevailing types.
> 
> Note I plan to add a new target hook to identify problematic FP modes.

That would be super-helpful.

Comment 20 Paul Eggert 2024-07-23 17:42:25 UTC

Created attachment 58739 [details]
preprocessed Emacs source code illustrating the bug

Here is the gzip-compressed text of the Emacs source code illustrating the bug. I generated it via the following commands on an AMD Phenom II X4 910e.

gcc -m32 -march=native -E -Demacs -I. -I. -I../lib -I../lib -isystem /usr/include/libpng16 -DWITH_GZFILEOP -isystem /usr/include/libxml2 -DWITH_GZFILEOP -isystem /usr/include/webp -MMD -MF deps/timefns.d -MP -Werror -fanalyzer -fstrict-flex-arrays -Wall -Warith-conversion -Wdate-time -Wdisabled-optimization -Wdouble-promotion -Wduplicated-cond -Wextra -Wformat-signedness -Wflex-array-member-not-at-end -Winit-self -Winvalid-pch -Wlogical-op -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wmissing-variable-declarations -Wnested-externs -Wnull-dereference -Wold-style-definition -Wopenmp-simd -Wpacked -Wpointer-arith -Wstrict-flex-arrays -Wstrict-prototypes -Wsuggest-attribute=format -Wsuggest-attribute=malloc -Wsuggest-attribute=noreturn -Wsuggest-final-methods -Wsuggest-final-types -Wtrampolines -Wuninitialized -Wunknown-pragmas -Wunused-macros -Wvariadic-macros -Wvector-operation-performance -Wwrite-strings -Warray-bounds=2 -Wattribute-alias=2 -Wformat=2 -Wformat-truncation=2 -Wimplicit-fallthrough=5 -Wshift-overflow=2 -Wuse-after-free=3 -Wvla-larger-than=4031 -Wno-analyzer-malloc-leak -Wredundant-decls -Wno-missing-field-initializers -Wno-override-init -Wno-sign-compare -Wno-type-limits -Wno-unused-parameter -Wno-format-nonliteral -Wno-bidi-chars -Wno-analyzer-fd-leak -g3 -O2 timefns.c >timefns.i

gzip -9n timefns.i

Comment 21 Paul Eggert 2024-07-23 17:46:17 UTC

Created attachment 58740 [details]
Assembly-language illustrating wrong code

Here is the assembly language output (gzip compressed) of the wrong code. The wrong code starts at line 3025, where fldl/fstpl is incorrectly used to copy a 64-bit int.

I used the same gcc command as before, except with -S instead of -E, and I omitted  to save space in the .s file.

Comment 22 Paul Eggert 2024-07-23 17:48:04 UTC

(In reply to Richard Biener from comment #17)

> I suppose this happens even when building without LTO, so can you please
> attach preprocessed source of the TU containing decode_lisp_time
> (preprocessed with all the flags used when the issue appears)?  Maybe
> it's really a duplicate of one of the related bugs (my patch didn't fix
> all of them).

Thanks, I've attached the preprocessed source and wrong code in my previous two comments. You're correct that it happens without LTO.

Comment 23 Richard Biener 2024-07-24 12:29:08 UTC

So it's from

    return (struct form_time)
      {
 .form = TIMEFORM_HI_LO,
 .time = decode_ticks_hz (specified_time, make_fixnum (1), cform)
      };

with

union c_time
{
  struct ticks_hz th;
  struct timespec ts;
  double d;
};

struct form_time
{
  enum timeform form;
  union c_time time;
};

since there's also

   .form = TIMEFORM_FLOAT,
   .time = decode_float_time (d, cform)

and that one is inlined we're likely only seeing an access to the double
union component and otherwise aggregate union accesses.

Still needs a reduction.

Comment 24 Martin Jambor 2024-07-24 19:33:40 UTC

Created attachment 58752 [details]
Another wip patch

To give sort-of a status update, this is the current state of my WIP fix.  It clearly needs more thinking about the reverse storage cases and still fails g++.dg/vect/pr64410.cc - but I'm afraid that would require a target hook to identify problematic FP modes.  Otherwise it passes bootstrap and testsuite on x86_64-linux.

Comment 25 Martin Jambor 2024-08-20 12:11:12 UTC

I have proposed a patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6ed6kntue.fsf@virgil.suse.cz/T/#u

Comment 26 GCC Commits 2024-08-21 12:49:51 UTC

The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:f577959f420ae404f99f630dadc1c0370734d0da

commit r15-3070-gf577959f420ae404f99f630dadc1c0370734d0da
Author: Martin Jambor <mjambor@suse.cz>
Date:   Wed Aug 21 14:49:11 2024 +0200

    sra: Avoid risking x87 magling binary representation of a replacement (PR 58416)
    
    PR 58416 shows that storing non-floating point data to floating point
    scalar registers can lead to miscompilations when the data is
    normalized or otherwise processed upon loading to a register.  To
    avoid that risk, this patch detects situations where we have multiple
    types and a we decide to represent the data in a type with a mode that
    is known to not be able to transfer actual bits reliably using the new
    TARGET_MODE_CAN_TRANSFER_BITS hook.
    
    gcc/ChangeLog:
    
    2024-08-19  Martin Jambor  <mjambor@suse.cz>
    
            PR target/58416
            * tree-sra.cc (types_risk_mangled_binary_repr_p): New function.
            (sort_and_splice_var_accesses): Use it.
            (propagate_subaccesses_from_rhs): Likewise.
    
    gcc/testsuite/ChangeLog:
    
    2024-08-19  Martin Jambor  <mjambor@suse.cz>
    
            PR target/58416
            * gcc.dg/torture/pr58416.c: New test.

Comment 27 Richard Biener 2024-08-21 13:26:12 UTC

Fixed on trunk.

Comment 28 Paul Eggert 2024-08-22 06:39:58 UTC

Thanks for the fix. I updated Emacs to no longer work around the bug when GCC 15+ is being used, here:

https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=3d1d4f109ed4115256a08c74eeb704259d91c9f4

Comment 29 Sam James 2024-08-22 06:41:20 UTC

It wasn't clear to me if your case was fixed (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416#c23) or if it needed reduction.