Bug 109780 - [13/14/15 Regression] csmith: runtime crash with -O2 -march=znver1
Summary: [13/14/15 Regression] csmith: runtime crash with -O2 -march=znver1
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P2 normal
Target Milestone: 13.5
Assignee: H.J. Lu
URL:
Keywords: wrong-code
Depends on: 118936
Blocks: 111843
  Show dependency treegraph
 
Reported: 2023-05-08 20:35 UTC by David Binderman
Modified: 2026-04-13 13:36 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 16.0
Known to fail:
Last reconfirmed: 2023-05-09 00:00:00


Attachments
C source code (36.18 KB, text/plain)
2023-05-08 20:35 UTC, David Binderman
Details
A patch (1.67 KB, patch)
2023-06-27 23:35 UTC, H.J. Lu
Details | Diff
test case broken by draft patch (at -O2 -mavx2 -mtune=haswell) (665 bytes, text/plain)
2023-06-28 21:39 UTC, Xi Ruoyao
Details
An updated patch (3.15 KB, patch)
2023-06-29 02:43 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Binderman 2023-05-08 20:35:21 UTC
Created attachment 55023 [details]
C source code

The attached C code seems to work fine with -O2:

$ ../results.20230507.asan.ubsan/bin/gcc -w -O2  bug916.c
$ ./a.out
checksum = 44DCF65B
$ 

But adding -march=znver1 seems to cause trouble:

$ ../results.20230507.asan.ubsan/bin/gcc -w -O2 -march=znver1  bug916.c
$ ./a.out
Segmentation fault (core dumped)
$ 

Adding the -fno-strict-aliasing flag doesn't seem to help:

$ ../results.20230507.asan.ubsan/bin/gcc -w -O2 -march=znver1 -fno-strict-aliasing  bug916.c
$ ./a.out
Segmentation fault (core dumped)
$ 

The bug seems to have existed for a while:

$ ../results.20230417/bin/gcc -w -O2 -march=znver1 -fno-strict-aliasing  bug916.c
$ ./a.out
Segmentation fault (core dumped)
$
Comment 1 Drea Pinski 2023-05-08 20:44:10 UTC
=> 0x0000000000402378 <func_36+136>:    vmovdqa %xmm3,(%r8)
   0x000000000040237d <func_36+141>:    vmovdqa %xmm2,0x10(%r8)
   0x0000000000402383 <func_36+147>:    neg    %r9
   0x0000000000402386 <func_36+150>:    vmovdqa %xmm2,0x20(%r8)
End of assembler dump.
(gdb) p $r8
$1 = 140737488346408
(gdb) p/x $r8
$2 = 0x7fffffffdd28
Comment 2 Drea Pinski 2023-05-08 20:49:28 UTC
I should mention there is no difference in the gimple dump between with and without -march=znver1 .
Comment 3 Alexander Monakov 2023-05-08 22:09:48 UTC
Works with -fstack-reuse=none. Somehow GCC is confused about stack layout in func_36. It leaves rsp misaligned by 8 mod 16, but computes the base of l_814 for
__builtin_memset (&l_814, 0, 80) as if rsp is aligned mod 16.
Comment 4 David Binderman 2023-05-09 08:10:29 UTC
Seems good with g:18547874ee205d83 dated 20220515 and bad with g:73f7109ffb159302,
dated yesterday.
Comment 5 David Binderman 2023-05-09 10:00:41 UTC
Seems good at date 20221106, so the date range is [20221106..20230417].

Trying 20230205.
Comment 6 David Binderman 2023-05-09 10:11:38 UTC
Broken at 20230205, so range is now [20221106.. 20230205].

Trying snapshot 20221218
Comment 7 David Binderman 2023-05-09 10:44:07 UTC
(In reply to David Binderman from comment #6)
> Broken at 20230205, so range is now [20221106.. 20230205].
> 
> Trying snapshot 20221218

That was good, so range is 20221218..20230108.

Trying snapshot 20230101.
Comment 8 David Binderman 2023-05-09 11:10:39 UTC
As far as the snapshots go, 20221218..20221225 seems to be the range.

In git, this is g:fd69977febf399d1992bbf8d66ae9170e0a4dc9f .. g:febb58d28bfa4b544ec7ffec2d61f46d25205ff0, which is 123 commits.

Trying g:89ba8366fe12fd2d04535c99ba67f33d7e305132.
Comment 9 Martin Liška 2023-05-09 11:32:49 UTC
Started with zen tuning revision r13-4839-geef81eefcdc2a5.
Comment 10 Alexander Monakov 2023-05-09 11:56:19 UTC
(In reply to Martin Liška from comment #9)
> Started with zen tuning revision r13-4839-geef81eefcdc2a5.

The issue is also reproducible with -march=haswell or -march=skylake, so you can use those for further bisection.
Comment 11 Martin Liška 2023-05-09 12:03:43 UTC
(In reply to Alexander Monakov from comment #10)
> (In reply to Martin Liška from comment #9)
> > Started with zen tuning revision r13-4839-geef81eefcdc2a5.
> 
> The issue is also reproducible with -march=haswell or -march=skylake, so you
> can use those for further bisection.

With -march=skylake it began with r13-4124-g156f523f9582f1 and with -march=haswell it started with the very same revision.
Comment 12 Alexander Monakov 2023-05-09 14:49:05 UTC
Eh, that commit sneakily changed avx2 tuning without explaining that in the Changelog. Anyway, it should possible to "workaround" that by compiling with

  -O2 -mavx2 -mtune=skylake-avx512

instead, in which case the bisect will likely point to commit  r12-2666-g29f0e955c97 ("x86: Update piecewise move and store") (before that, expansion of memcpy via store-by-pieces wouldn't use avx2).

Also, PR 109093 looks very related.
Comment 13 Alexander Monakov 2023-05-09 14:57:27 UTC
PR 109087 might also be solved by this.

Marking as 12/13/14 regression since this is reproducible with -O2 plus basic -m flags, while earlier PRs also needed -ftrivial-auto-var-init and lost the regression markers in the meanwhile.
Comment 14 Xi Ruoyao 2023-06-26 12:57:59 UTC
Any clue about how to fix this?
Comment 15 Xi Ruoyao 2023-06-27 18:45:56 UTC
attachment 54666 [details] from PR109093 seems able to fix this.  Could we make it into trunk and the release branches?
Comment 16 H.J. Lu 2023-06-27 23:35:04 UTC
Created attachment 55409 [details]
A patch

I am stilling trying to find a small testcase.
Comment 17 Xi Ruoyao 2023-06-28 09:48:02 UTC
(In reply to H.J. Lu from comment #16)
> Created attachment 55409 [details]
> A patch
> 
> I am stilling trying to find a small testcase.

The patch triggers an ICE building Spidermonkey 115b9 (it segfaults with GCC trunk because of some unaligned vmovdqa):

0x93297b ix86_finalize_stack_frame_flags
	../../gcc/gcc/config/i386/i386.cc:8224
0x162064c ix86_expand_epilogue(int)
	../../gcc/gcc/config/i386/i386.cc:9405
0x1b2e27f gen_epilogue()
	../../gcc/gcc/config/i386/i386.md:17517
0x160a815 target_gen_epilogue
	../../gcc/gcc/config/i386/i386.md:17013
0xf15e86 make_epilogue_seq
	../../gcc/gcc/function.cc:5964
0xf15f8b thread_prologue_and_epilogue_insns()
	../../gcc/gcc/function.cc:6046
0xf166c2 rest_of_handle_thread_prologue_and_epilogue
	../../gcc/gcc/function.cc:6544
0xf166c2 execute
	../../gcc/gcc/function.cc:6625

The code at i386.cc:8224 reads:

  if (crtl->stack_realign_finalized)
    {
      /* After stack_realign_needed is finalized, we can't no longer
         change it.  */
      gcc_assert (crtl->stack_realign_needed == stack_realign);
      return;
    }

I'm not sure if the assert should be dropped or it's more difficult.

Or can we just force to use unaligned vector moves for block operations until we can find a better solution?  It's at least better than leaving the vectorized block moving broken and forcing people trying to disable the feature.
Comment 18 Xi Ruoyao 2023-06-28 09:59:41 UTC
(In reply to Xi Ruoyao from comment #17)
> (In reply to H.J. Lu from comment #16)
> > Created attachment 55409 [details]
> > A patch
> > 
> > I am stilling trying to find a small testcase.
> 
> The patch triggers an ICE building Spidermonkey 115b9 (it segfaults with GCC
> trunk because of some unaligned vmovdqa):

I mean, "GCC trunk and -O3 -march=tigerlake -mtune=tigerlake".
Comment 19 H.J. Lu 2023-06-28 15:47:38 UTC
(In reply to Xi Ruoyao from comment #17)
> (In reply to H.J. Lu from comment #16)
> > Created attachment 55409 [details]
> > A patch
> > 
> > I am stilling trying to find a small testcase.
> 
> The patch triggers an ICE building Spidermonkey 115b9 (it segfaults with GCC
> trunk because of some unaligned vmovdqa):
> 
> 0x93297b ix86_finalize_stack_frame_flags
> 	../../gcc/gcc/config/i386/i386.cc:8224
> 0x162064c ix86_expand_epilogue(int)
> 	../../gcc/gcc/config/i386/i386.cc:9405
> 0x1b2e27f gen_epilogue()
> 	../../gcc/gcc/config/i386/i386.md:17517
> 0x160a815 target_gen_epilogue
> 	../../gcc/gcc/config/i386/i386.md:17013
> 0xf15e86 make_epilogue_seq
> 	../../gcc/gcc/function.cc:5964
> 0xf15f8b thread_prologue_and_epilogue_insns()
> 	../../gcc/gcc/function.cc:6046
> 0xf166c2 rest_of_handle_thread_prologue_and_epilogue
> 	../../gcc/gcc/function.cc:6544
> 0xf166c2 execute
> 	../../gcc/gcc/function.cc:6625
> 
> The code at i386.cc:8224 reads:
> 
>   if (crtl->stack_realign_finalized)
>     {
>       /* After stack_realign_needed is finalized, we can't no longer
>          change it.  */
>       gcc_assert (crtl->stack_realign_needed == stack_realign);
>       return;
>     }
> 
> I'm not sure if the assert should be dropped or it's more difficult.
> 
> Or can we just force to use unaligned vector moves for block operations
> until we can find a better solution?  It's at least better than leaving the
> vectorized block moving broken and forcing people trying to disable the
> feature.

Do you have a testcase?
Comment 20 Xi Ruoyao 2023-06-28 15:49:39 UTC
(In reply to H.J. Lu from comment #19)
> (In reply to Xi Ruoyao from comment #17)
> > (In reply to H.J. Lu from comment #16)
> > > Created attachment 55409 [details]
> > > A patch
> > > 
> > > I am stilling trying to find a small testcase.
> > 
> > The patch triggers an ICE building Spidermonkey 115b9 (it segfaults with GCC
> > trunk because of some unaligned vmovdqa):
> > 
> > 0x93297b ix86_finalize_stack_frame_flags
> > 	../../gcc/gcc/config/i386/i386.cc:8224
> > 0x162064c ix86_expand_epilogue(int)
> > 	../../gcc/gcc/config/i386/i386.cc:9405
> > 0x1b2e27f gen_epilogue()
> > 	../../gcc/gcc/config/i386/i386.md:17517
> > 0x160a815 target_gen_epilogue
> > 	../../gcc/gcc/config/i386/i386.md:17013
> > 0xf15e86 make_epilogue_seq
> > 	../../gcc/gcc/function.cc:5964
> > 0xf15f8b thread_prologue_and_epilogue_insns()
> > 	../../gcc/gcc/function.cc:6046
> > 0xf166c2 rest_of_handle_thread_prologue_and_epilogue
> > 	../../gcc/gcc/function.cc:6544
> > 0xf166c2 execute
> > 	../../gcc/gcc/function.cc:6625
> > 
> > The code at i386.cc:8224 reads:
> > 
> >   if (crtl->stack_realign_finalized)
> >     {
> >       /* After stack_realign_needed is finalized, we can't no longer
> >          change it.  */
> >       gcc_assert (crtl->stack_realign_needed == stack_realign);
> >       return;
> >     }
> > 
> > I'm not sure if the assert should be dropped or it's more difficult.
> > 
> > Or can we just force to use unaligned vector moves for block operations
> > until we can find a better solution?  It's at least better than leaving the
> > vectorized block moving broken and forcing people trying to disable the
> > feature.
> 
> Do you have a testcase?

It's too large and I'm running cvise on it.
Comment 21 Xi Ruoyao 2023-06-28 21:39:57 UTC
Created attachment 55421 [details]
test case broken by draft patch (at -O2 -mavx2 -mtune=haswell)
Comment 22 Xi Ruoyao 2023-06-28 21:40:23 UTC
(In reply to H.J. Lu from comment #19)

> Do you have a testcase?

Attached.
Comment 23 H.J. Lu 2023-06-29 02:43:06 UTC
Created attachment 55424 [details]
An updated patch
Comment 24 Xi Ruoyao 2023-06-29 08:48:09 UTC
(In reply to H.J. Lu from comment #23)
> Created attachment 55424 [details]
> An updated patch

Unfortunately Spidermonkey 115 still crashes even with the patch (and -O3 -march=tigerlike -mtune=tigerlake -fno-exceptions).  The problem seems an unaligned stack slot is assigned for an object of a 512-bit aligned class.  Then the pointer to the object is passed to another function which stores into the object with 256-bit vmovdqu but the stack slot is only aligned to 128-bit in fact.  I'll try to reduce.
Comment 25 Xi Ruoyao 2023-06-29 09:17:45 UTC
(In reply to Xi Ruoyao from comment #24)
> (In reply to H.J. Lu from comment #23)
> > Created attachment 55424 [details]
> > An updated patch
> 
> Unfortunately Spidermonkey 115 still crashes even with the patch (and -O3
> -march=tigerlike -mtune=tigerlake -fno-exceptions).  The problem seems an
> unaligned stack slot is assigned for an object of a 512-bit aligned class. 
> Then the pointer to the object is passed to another function which stores
> into the object with 256-bit vmovdqu but the stack slot is only aligned to
> 128-bit in fact.  I'll try to reduce.

Nope, it seems a bug in mozilla code.
Comment 26 Martin Jambor 2024-01-10 17:33:37 UTC
Seems like there is nothing to bisect any more, please re-add the keyword is I am wrong.
Comment 27 Richard Biener 2024-06-20 09:12:29 UTC
GCC 12.4 is being released, retargeting bugs to GCC 12.5.
Comment 28 Sam James 2024-11-03 06:12:36 UTC
trunk works for me, but 14 doesn't.

(Needed -fno-stack-protector -fno-stack-clash-protection to override some defaults.)

I don't think https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780#c23 was committed though.
Comment 29 Sam James 2025-02-06 18:24:48 UTC
```
char a;
static int b, c, f;
char *d = &a;
static char *e = &a;
void g(int h, int i) {
  int j = 1;
  for (; c != -3; c = c - 1) {
    int k[10];
    f = 0;
    for (; f < 10; f++)
      k[f] = 0;
    *d = k[1];
    if (i < *d) {
      *e = h;
      for (; j < 9; j++) {
        b = 1;
        for (; b < 7; b++)
          ;
      }
    }
  }
}
int main() { g(1, 1); }
```

```
$ gcc-14 p.c -o p -O2 -march=znver1 -fno-stack-protector -fno-stack-clash-protection && ./p
Segmentation fault (core dumped)
```
Comment 30 Sam James 2025-02-06 18:26:47 UTC
Needs bisection to see why trunk works.

Note that the original at least worked on trunk back in November so it may just be latent rather than fixed by some of the recent (kind of similar looking) changes for other bugs.
Comment 31 Jakub Jelinek 2025-02-06 19:12:15 UTC
#c29 works since r15-571-g1e0ae1f52741f7e0133661659ed2d210f939a398
Comment 32 Drea Pinski 2025-02-06 19:28:39 UTC
(In reply to Jakub Jelinek from comment #31)
> #c29 works since r15-571-g1e0ae1f52741f7e0133661659ed2d210f939a398

That means it most likely went latent.
Comment 33 Uroš Bizjak 2025-02-06 19:58:50 UTC
FTR, ix86_find_max_used_stack_alignment increases alignment only when stack pointer or frame pointer are explicitly mentioned in :

		/* Find the maximum stack alignment.  */
		subrtx_iterator::array_type array;
		FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
		  if (MEM_P (*iter)
		      && (reg_mentioned_p (stack_pointer_rtx,
					   *iter)
			  || reg_mentioned_p (frame_pointer_rtx,
					      *iter)))
		    {
		      unsigned int alignment = MEM_ALIGN (*iter);
		      if (alignment > stack_alignment)
			stack_alignment = alignment;


This RTX does *not* trigger stack alignment increase in the above code:

(insn 94 20 95 5 (set (mem/c:V16QI (reg/f:DI 1 dx [110]) [0 MEM <char[1:40]> [(void *)&k]+0 S16 A128])
        (reg:V16QI 21 xmm1 [orig:122 MEM <char[1:40]> [(void *)&k] ] [122])) "pr109780.c":11:12 2015 {movv16qi_internal}
     (nil))
Comment 34 Uroš Bizjak 2025-02-06 20:14:06 UTC
The problematic code is expanded from:

;; Generating RTL for gimple basic block 5

;; __builtin_memset (&k, 0, 40);

(insn 21 20 22 (parallel [
            (set (reg:DI 107)
                (plus:DI (reg/f:DI 93 virtual-stack-vars)
                    (const_int -48 [0xffffffffffffffd0])))
            (clobber (reg:CC 17 flags))
        ]) "pr109780.c":11:12 -1
     (nil))

(insn 22 21 23 (set (reg:V32QI 108)
        (const_vector:V32QI [
                (const_int 0 [0]) repeated x32
            ])) "pr109780.c":11:12 -1
     (nil))

(insn 23 22 24 (set (mem/c:V16QI (reg:DI 107) [0 MEM <char[1:40]> [(void *)&k]+0 S16 A128])
        (vec_select:V16QI (reg:V32QI 108)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                    (const_int 4 [0x4])
                    (const_int 5 [0x5])
                    (const_int 6 [0x6])
                    (const_int 7 [0x7])
                    (const_int 8 [0x8])
                    (const_int 9 [0x9])
                    (const_int 10 [0xa])
                    (const_int 11 [0xb])
                    (const_int 12 [0xc])
                    (const_int 13 [0xd])
                    (const_int 14 [0xe])
                    (const_int 15 [0xf])
                ]))) "pr109780.c":11:12 -1
     (nil))

(insn 24 23 25 (set (mem/c:V16QI (plus:DI (reg:DI 107)
                (const_int 16 [0x10])) [0 MEM <char[1:40]> [(void *)&k]+16 S16 A128])
        (vec_select:V16QI (reg:V32QI 108)
            (parallel [
                    (const_int 16 [0x10])
                    (const_int 17 [0x11])
                    (const_int 18 [0x12])
                    (const_int 19 [0x13])
                    (const_int 20 [0x14])
                    (const_int 21 [0x15])
                    (const_int 22 [0x16])
                    (const_int 23 [0x17])
                    (const_int 24 [0x18])
                    (const_int 25 [0x19])
                    (const_int 26 [0x1a])
                    (const_int 27 [0x1b])
                    (const_int 28 [0x1c])
                    (const_int 29 [0x1d])
                    (const_int 30 [0x1e])
                    (const_int 31 [0x1f])
                ]))) "pr109780.c":11:12 -1
     (nil))

(insn 25 24 0 (set (mem/c:DI (plus:DI (reg:DI 107)
                (const_int 32 [0x20])) [0 MEM <char[1:40]> [(void *)&k]+32 S8 A128])
        (subreg:DI (reg:V32QI 108) 0)) "pr109780.c":11:12 -1
     (nil))
Comment 35 Uroš Bizjak 2025-02-06 20:22:30 UTC
(In reply to Uroš Bizjak from comment #34)
> The problematic code is expanded from:
> 
> ;; Generating RTL for gimple basic block 5
> 
> ;; __builtin_memset (&k, 0, 40);
> 
> (insn 21 20 22 (parallel [
>             (set (reg:DI 107)
>                 (plus:DI (reg/f:DI 93 virtual-stack-vars)
>                     (const_int -48 [0xffffffffffffffd0])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr109780.c":11:12 -1
>      (nil))

It is not a good idea to CSE address that refers to virtual stack vars to a temporary. This defeats stack/frame pointer detection, mentioned in Comment #33, and consequently fails to increase stack alignment.
Comment 36 Uroš Bizjak 2025-02-06 21:29:44 UTC
(In reply to Uroš Bizjak from comment #35)

> It is not a good idea to CSE address that refers to virtual stack vars to a
> temporary. This defeats stack/frame pointer detection, mentioned in Comment
> #33, and consequently fails to increase stack alignment.

This is expanded via builtins.cc/expand_builtin_memset_args.
Comment 37 Uroš Bizjak 2025-02-11 11:01:13 UTC
(In reply to H.J. Lu from comment #23)
> Created attachment 55424 [details]
> An updated patch

Is this patch similar to the one in PR109093#c17 ? As argued in PR109093#c35, it looks that the current detection of instructions that need increase in stack alignment is insufficient.
Comment 38 Uroš Bizjak 2025-02-11 11:11:08 UTC
(In reply to Sam James from comment #29)
> $ gcc-14 p.c -o p -O2 -march=znver1 -fno-stack-protector
> -fno-stack-clash-protection && ./p
> Segmentation fault (core dumped)

Adding -mpreferred-stack-boundary=3 (as expected) "fixes" the testcase. The difference of "g" is in the realignment AND insn:

        vmovdqa %xmm1, %xmm0
        .cfi_offset 3, -24
        movl    $1, %ebx
-       leaq    -48(%rsp), %rdx
+       andq    $-16, %rsp
        movl    $9, %r11d
+       leaq    -48(%rsp), %rdx
        movq    d(%rip), %rcx
        jmp     .L7
Comment 39 H.J. Lu 2025-02-11 11:27:45 UTC
(In reply to Uroš Bizjak from comment #37)
> (In reply to H.J. Lu from comment #23)
> > Created attachment 55424 [details]
> > An updated patch
> 
> Is this patch similar to the one in PR109093#c17 ? As argued in
> PR109093#c35, it looks that the current detection of instructions that need
> increase in stack alignment is insufficient.

This one is a newer patch.  I will take a look.
Comment 40 GCC Commits 2025-02-16 00:19:47 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:11902be7a57c0ccf03786aa0255fffaf0f54dbf9

commit r15-7573-g11902be7a57c0ccf03786aa0255fffaf0f54dbf9
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Mar 14 11:41:51 2023 -0700

    x86: Properly find the maximum stack slot alignment
    
    Don't assume that stack slots can only be accessed by stack or frame
    registers.  We first find all registers defined by stack or frame
    registers.  Then check memory accesses by such registers, including
    stack and frame registers.
    
    gcc/
    
            PR target/109780
            PR target/109093
            * config/i386/i386.cc (ix86_update_stack_alignment): New.
            (ix86_find_all_reg_use_1): Likewise.
            (ix86_find_all_reg_use): Likewise.
            (ix86_find_max_used_stack_alignment): Also check memory accesses
            from registers defined by stack or frame registers.
    
    gcc/testsuite/
    
            PR target/109780
            PR target/109093
            * g++.target/i386/pr109780-1.C: New test.
            * gcc.target/i386/pr109093-1.c: Likewise.
            * gcc.target/i386/pr109780-1.c: Likewise.
            * gcc.target/i386/pr109780-2.c: Likewise.
            * gcc.target/i386/pr109780-3.c: Likewise.
    
    Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Comment 41 Jeffrey A. Law 2025-02-16 04:46:05 UTC
So is this fixed on the trunk now?  If so, let's remove the "15" from the regression marker set.
Comment 42 Sam James 2025-02-16 06:25:28 UTC
Updated.
Comment 43 Sam James 2025-02-20 09:27:03 UTC
Patch was reverted in r15-7634-g0312d11be3f666 and r15-7635-g6921c93d205203 to try again in GCC 15.
Comment 44 GCC Commits 2025-04-28 13:29:37 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:b9ea3b2ef98048f93b02fcd6ff51777bce1676c2

commit r16-194-gb9ea3b2ef98048f93b02fcd6ff51777bce1676c2
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Mar 14 11:41:51 2023 -0700

    x86: Properly find the maximum stack slot alignment
    
    Don't assume that stack slots can only be accessed by stack or frame
    registers.  We first find all registers defined by stack or frame
    registers.  Then check memory accesses by such registers, including
    stack and frame registers.
    
    gcc/
    
            PR target/109780
            PR target/109093
            * config/i386/i386.cc (stack_access_data): New.
            (ix86_update_stack_alignment): Likewise.
            (ix86_find_all_reg_use_1): Likewise.
            (ix86_find_all_reg_use): Likewise.
            (ix86_find_max_used_stack_alignment): Also check memory accesses
            from registers defined by stack or frame registers.
    
    gcc/testsuite/
    
            PR target/109780
            PR target/109093
            * g++.target/i386/pr109780-1.C: New test.
            * gcc.target/i386/pr109093-1.c: Likewise.
            * gcc.target/i386/pr109780-1.c: Likewise.
            * gcc.target/i386/pr109780-2.c: Likewise.
            * gcc.target/i386/pr109780-3.c: Likewise.
    
    Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
    Co-Authored-By: Uros Bizjak <ubizjak@gmail.com>
Comment 45 Richard Biener 2025-07-11 09:15:31 UTC
GCC 12 branch is being closed.
Comment 46 Richard Biener 2026-04-13 13:36:44 UTC
On the 15 branch gcc.target/i386/pr109093-1.c still fails, at ix86_compute_frame_layout time preferred_alignment is 8 and
stack_alignment_needed is 16, the frame during LRA is computed to

{nsseregs = 0, nregs = 1, va_arg_size = 0, red_zone_size = 80, 
  outgoing_arguments_size = 0, frame_pointer_offset = 32, hard_frame_pointer_offset = 16, 
  stack_pointer_offset = 32, hfp_save_offset = 16, reg_save_offset = 24, 
  stack_realign_allocate = 0, stack_realign_offset = 32, sse_reg_save_offset = 32, 
  save_regs_using_mov = false, expensive_p = false, expensive_count = 0}

but at ix86_compute_frame_layout time, under same constraints,

{nsseregs = 0, nregs = 1, va_arg_size = 0, red_zone_size = 88, 
  outgoing_arguments_size = 0, frame_pointer_offset = 32, hard_frame_pointer_offset = 16, 
  stack_pointer_offset = 24, hfp_save_offset = 16, reg_save_offset = 24, 
  stack_realign_allocate = 0, stack_realign_offset = 24, sse_reg_save_offset = 24, 
  save_regs_using_mov = false, expensive_p = false, expensive_count = 0}

because somehow cfun->machine->max_used_stack_alignment is now 8.

It seems the 15 branch has ix86_find_max_used_stack_alignment as well.
As I thought this was introduced with r8-2912-gf174328efedff7 with
the attempt to avoid unnecessary stack-realignment given at RTL expansion
time we conservatively estimate that requirement.  The rev in question
just extended some code trying to do the same (revise earlier computed
stack alignment req