113359 – [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

Bug 113359 - [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

Summary: [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	ipa (show other bugs)
Version:	13.2.1

Importance:	P2 normal
Target Milestone:	13.3
Assignee:	Martin Jambor

URL:
Keywords:	lto, wrong-code

Duplicates (1):	114263 (view as bug list)
Depends on:
Blocks:

Reported:	2024-01-12 18:31 UTC by Jakub Jelinek
Modified:	2024-11-05 05:43 UTC (History)
CC List:	11 users (show)

See Also:	https://bugzilla.redhat.com/show_bug.cgi?id=2241339 114263 107467 114662
Host:
Target:	aarch64 x86_64
Build:
Known to work:	14.0
Known to fail:
Last reconfirmed:	2024-03-09 00:00:00

Attachments
Timer.ii.xz (871.54 KB, application/x-xz) 2024-01-12 18:32 UTC, Jakub Jelinek	Details
SloppyCRCMap.ii.xz (664.60 KB, application/x-xz) 2024-01-12 18:32 UTC, Jakub Jelinek	Details
CrtStuff.i.xz (151 bytes, application/x-xz) 2024-01-12 18:33 UTC, Jakub Jelinek	Details
Timer.ii.xz from GCC 14 (878.88 KB, application/x-xz) 2024-01-12 19:54 UTC, Jakub Jelinek	Details
SloppyCRCMap.ii.xz (671.61 KB, application/x-xz) 2024-01-12 19:56 UTC, Jakub Jelinek	Details
Simple testcase (needs disabling early - and only early - SRA) (436 bytes, text/plain) 2024-02-19 15:31 UTC, Martin Jambor	Details
Potential fix (7.20 KB, patch) 2024-03-28 08:59 UTC, Martin Jambor	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jakub Jelinek 2024-01-12 18:31:06 UTC

As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2241339
when ceph is compiled with LTO on aarch64
_M_emplace_equal<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>&>
is changed by SRA to only store 64 + 32 bits into the std::pair rather than 64 + 64 bits.

Reproducer:
./xg++ -B ./ -O2 -flto=auto -ffat-lto-objects -fexceptions -g -Wall -Wno-complain-wrong-lang -Werror=format-security -fstack-protector-strong -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -std=c++20 -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wstrict-null-sentinel -Woverloaded-virtual -fstack-protector-strong -fdiagnostics-color=auto -c Timer.ii
./xg++ -B ./ -O2 -flto=auto -ffat-lto-objects -fexceptions -g -Wall -Wno-complain-wrong-lang -Werror=format-security -fstack-protector-strong -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -std=c++20 -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wstrict-null-sentinel -Woverloaded-virtual -fstack-protector-strong -fdiagnostics-color=auto -c SloppyCRCMap.ii
./xg++ -B ./ -O2 -c CrtStuff.i
./xg++ -B ./ -flto=auto -shared -o out.so SloppyCRCMap.o Timer.o CrtStuff.o -nostdlib
grep_cleanup() {
    c++filt | grep -A16 '_M_emplace_equal<std::pair<std::chrono.*>:' | sed 's/[^:]*: *//'
}
diff -u <(aarch64-linux-gnu-objdump -d Timer.o | grep_cleanup) <(aarch64-linux-gnu-objdump -d out.so | grep_cleanup)

which prints
--- /dev/fd/63	2024-01-12 19:24:37.317433462 +0100
+++ /dev/fd/62	2024-01-12 19:24:37.318433448 +0100
@@ -7,11 +7,11 @@
 	d2800600 	mov	x0, #0x30                  	// #48
 	f90013f5 	str	x21, [sp, #32]
 	aa0103f5 	mov	x21, x1
-	94000000 	bl	0 <operator new(unsigned long)>
+	97fff0bc 	bl	6ee0 <operator new(unsigned long)@plt>
 	aa0003f4 	mov	x20, x0
 	f9400a62 	ldr	x2, [x19, #16]
 	91002263 	add	x3, x19, #0x8
 	f94002a7 	ldr	x7, [x21]
 	f9001007 	str	x7, [x0, #32]
-	f94006a0 	ldr	x0, [x21, #8]
-	f9001680 	str	x0, [x20, #40]
+	b9400aa1 	ldr	w1, [x21, #8]
+	b9002801 	str	w1, [x0, #40]

i.e. the -ffat-lto-objects non-LTO compilation stores 64-bit pointers at the new returned pointer + 32 and + 40, while
LTO stores 64-bit pointer only to the former and 32-bit into the latter.

In *.cplxlower1 I still see
  MEM[(struct pair *)_37 + 32B] = ISRA.253;
where
  struct pair ISRA.253;
but sra has:
Created a replacement for ISRA.253 offset: 0, size: 64: SR.256D.22298
Created a replacement for ISRA.253 offset: 64, size: 32: SR.257D.22299
From what I can see, that should be
std::multimap<clock_t::time_point, Context*>::value_type, i.e.
std::pair<const clock_t::time_point, Context*>

Comment 1 Jakub Jelinek 2024-01-12 18:32:37 UTC

Created attachment 57056 [details]
Timer.ii.xz

Comment 2 Jakub Jelinek 2024-01-12 18:32:59 UTC

Created attachment 57057 [details]
SloppyCRCMap.ii.xz

Comment 3 Jakub Jelinek 2024-01-12 18:33:21 UTC

Created attachment 57058 [details]
CrtStuff.i.xz

Comment 4 Jakub Jelinek 2024-01-12 18:34:22 UTC

Will try to reproduce with trunk next (unfortunately due to the builtin trait changes the GCC 13 preprocessed sources can't be compiled with GCC 14).

Comment 5 Jakub Jelinek 2024-01-12 19:54:59 UTC

Created attachment 57060 [details]
Timer.ii.xz from GCC 14

Comment 6 Jakub Jelinek 2024-01-12 19:56:52 UTC

Created attachment 57061 [details]
SloppyCRCMap.ii.xz

When using current GCC trunk instead of current GCC 13 branch (CrtStuff.i.xz is the same between both), the problematic optimization doesn't happen, both before SRA and after it the code has
MEM[(struct pair *)_5 + 32B] = MEM[(const struct pair &)__args#0_4(D)];

Comment 7 Andrew Pinski 2024-01-12 20:29:01 UTC

I am going to double check something but it might be related to PR 112616 .

Comment 8 Richard Biener 2024-01-15 08:24:07 UTC

All these might also point to IPA mod-ref issues.

Comment 9 Martin Jambor 2024-02-05 21:48:48 UTC

SRA creates the replacements (in GCC 13) during total scalarization,
i.e. the bit that is not driven by pre-existing accesses to
aggregates, but because it sees an aggregate that is small and regular
and so it is split according to its type in the hope it will go away.

Unfortunately in the LTO and non-LTO case, they see a different type.
I have added a dumping of types and fields of totally scalarized
records and got the following.

In the non-LTO case, the type of the aggregate is:
  <record_type 0xffff553cabd0 pair readonly sizes-gimplified cxx-odr-p type_1 type_5 type_6 TI
    size <integer_cst 0xffff6fe96090 type <integer_type 0xffff6fe9c0a8 bitsizetype> constant 128>
    unit-size <integer_cst 0xffff6fe960a8 type <integer_type 0xffff6fe9c000 sizetype> constant 16>
    align:64 warn_if_not_align:0 symtab:1430035184 alias-set -1 canonical-type 0xffff553cabd0
    ...

and specifically its third field is a pointer:
  <field_decl 0xffff55161ab0 second
    type <pointer_type 0xffff4cf303f0
        type <record_type 0xffff4cf30348 Context cxx-odr-p type_1 type_4 type_5 type_6 VOID
            align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xffff562720a8 context <translation_unit_decl 0xffff6fe9a000 /home/jakub/rpmbuild/BUILD/ceph-18.2.1/src/common/Timer.cc>
            pointer_to_this <pointer_type 0xffff4cf303f0>>
        unsigned DI
        size <integer_cst 0xffff6fe96048 constant 64>
        unit-size <integer_cst 0xffff6fe96060 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xffff562729d8 reference_to_this <reference_type 0xffff4cf6c738>>
    used unsigned nonlocal decl_3 DI /usr/include/c++/13/bits/stl_pair.h:194:11
    size <integer_cst 0xffff6fe96048 type <integer_type 0xffff6fe9c0a8 bitsizetype> constant 64>
    unit-size <integer_cst 0xffff6fe96060 type <integer_type 0xffff6fe9c000 sizetype> constant 8>
    align:64 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
    offset <integer_cst 0xffff6fe96078 type <integer_type 0xffff6fe9c000 sizetype> constant 0>
    bit-offset <integer_cst 0xffff6fe96048 type <integer_type 0xffff6fe9c0a8 bitsizetype> constant 64> context <record_type 0xffff553ba540 pair>>


However, in the LTO case the type of the aggregate is:
  <record_type 0xffff61cc1498 pair cxx-odr-p TI
    size <integer_cst 0xffff623fdd08 type <integer_type 0xffff624100a8 bitsizetype> constant 128>
    unit-size <integer_cst 0xffff623fdd20 type <integer_type 0xffff62410000 sizetype> constant 16>
    align:64 warn_if_not_align:0 symtab:0 alias-set 98 canonical-type 0xffff61cc1498
    ...

which however has an unsigned int as its third field:
 <field_decl 0xffff61cc05f0 second
    type <integer_type 0xffff62410690 unsigned int public unsigned SI
        size <integer_cst 0xffff623fdf00 constant 32>
        unit-size <integer_cst 0xffff623fdf18 constant 4>
        align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xffff62410690 precision:32 min <integer_cst 0xffff623fdf30 0> max <integer_cst 0xffff623fdee8 4294967295>
        pointer_to_this <pointer_type 0xffff61cf50a8> reference_to_this <reference_type 0xffff618fd5e8>>
    unsigned nonlocal SI /usr/include/c++/13/bits/stl_pair.h:194:11
    size <integer_cst 0xffff623fdf00 type <integer_type 0xffff624100a8 bitsizetype> constant 32>
    unit-size <integer_cst 0xffff623fdf18 type <integer_type 0xffff62410000 sizetype> constant 4>
    align:32 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
    offset <integer_cst 0xffff623fdcf0 type <integer_type 0xffff62410000 sizetype> constant 0>
    bit-offset <integer_cst 0xffff623fdcc0 type <integer_type 0xffff624100a8 bitsizetype> constant 64> context <record_type 0xffff61cc1498 pair>>

An so only an unsigned int replacement is created.

The name of the aggregate indicates it has been created by IPA-SRA and
so that is where I am looking right now, but IPA-SRA simply takes (and
streams) the type of the access in the original function body for
these.  Can't this perhaps be some type-merging issue?

Comment 10 Richard Biener 2024-02-06 08:29:53 UTC

I see the 'pair' type is marked TYPE_CXX_ODR_P, I'd say you should see a
ODR type violation diagnostic, and if you don't, this means we force different
alias sets for both?  Not sure - Honza added this stuff.

It only affects TYPE_CANONICAL though, regular type merging shouldn't merge
them but it's likely that you get to see another type because of COMDATs
and symbol merging chosing a different prevailing function which has that
other type?

Btw, can you dump the mangled name of the type?  It should be type_with_linkage_p () I think, of course 'pair' itself is a template
so only a specific instantiation should be subject to ODR.  (of course
there might be ODR functions that use different instantiated pair in
the signature ..)

Comment 11 Jan Hubicka 2024-02-06 10:00:48 UTC

If there are two ODR types with same ODR name one with integer and other with pointer types third field, then indeed we should get ODR warning and give up on handling them as ODR types for type merging.

So dumping their assembler names would be useful starting point.

Of course if you have two ODR types of different names but you mix them up in COMDAT function of same name, then the warning will not trigger, so this might be some missing type compatibility check in ipa-sra or ipa-prop summary, too.

Comment 12 Jakub Jelinek 2024-02-06 12:39:46 UTC

Just going from the demangled name of
std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>
it would surprise me if it was ODR violation in the testcase, because class Context
is only defined in Timer.ii, not in the other preprocessed source, ceph::mono_clock is defined in both but looks the same (and it is empty class anyway), and the pair uses Context* as second type anyway, so it is unclear how it could become something smaller than pointer.
But, admittedly I haven't looked up at this under the debugger and not even sure where to look at that.

Comment 13 rguenther@suse.de 2024-02-06 12:47:42 UTC

On Tue, 6 Feb 2024, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359
> 
> --- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Just going from the demangled name of
> std::pair<std::chrono::time_point<ceph::mono_clock,
> std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const,
> Context*>
> it would surprise me if it was ODR violation in the testcase, because class
> Context
> is only defined in Timer.ii, not in the other preprocessed source,
> ceph::mono_clock is defined in both but looks the same (and it is empty class
> anyway), and the pair uses Context* as second type anyway, so it is unclear how
> it could become something smaller than pointer.
> But, admittedly I haven't looked up at this under the debugger and not even
> sure where to look at that.

Might be also an interaction with IPA ICF in case there's a pointer to
the pair involved?

Comment 14 Martin Jambor 2024-02-07 17:01:48 UTC

(In reply to rguenther@suse.de from comment #13)
> Might be also an interaction with IPA ICF in case there's a pointer to
> the pair involved?

Yes, this is exactly what seems to be happening.  The problem goes
away with -fno-icf.

(Possibly because the testcase uses -fno-strict-aliasing,) IPA-ICF
merges two functions which copy a structure and that access type it
what IPA-SRA saves, but loads only the one of the merged functions.
SRA then uses the (wrong) type to split aggregate copies into copies
by individual fields.

I have talked to Honza about this.  It seems that IPA-ICF needs to be
careful about aggreage with holes in different places.  The ideal next
step would be to create a testcase not dependent on IPA-SRA.

Comment 15 Martin Jambor 2024-02-19 15:31:41 UTC

Created attachment 57462 [details]
Simple testcase (needs disabling early - and only early - SRA)

This is a simpler testcase which exhibits the problem on x86_64-linux
and current master.  Steps to reproduce:

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp  --disable-tree-esra -flto pr113359.c -c -o 1.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp  --disable-tree-esra -flto -DFILE2 pr113359.c -c -o 2.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -flto 1.o 2.o -o test.exe                            

$ ./test.exe 
Aborted (core dumped)


If you add -fno-ipa-icf to the "compilation" commands, the test will
pass.

Late (post ICF) intra-procedural SRA is necessary to exhibit the
problem.  On the other hand, early SRA must be suppressed or it will
scalarize the aggregate assignment too early and the results will look
different to IPA-ICF.  Instead of using --disable-tree-esra we could
pass the address of tmp in both geta() and getb() to an empty function
coming from a third compilation unit.

Disabling strict aliasing is also necessary to show the problem, with
strict aliasing IPA-ICF takes the alias class of types into acount
when hashing and considers geta() and getb() different from the start.

Comment 16 Andrew Pinski 2024-03-09 21:44:37 UTC

*** Bug 114263 has been marked as a duplicate of this bug. ***

Comment 17 Sam James 2024-03-09 21:45:31 UTC

There's another small testcase in PR114263.

Comment 18 Andrew Pinski 2024-03-09 21:45:50 UTC

Confirmed.

Comment 19 avieira 2024-03-15 15:13:46 UTC

Should we update target and summary to also include x86_64?

Comment 20 Jeffrey A. Law 2024-03-15 15:25:45 UTC

Go right ahead.  I'm mostly trying to get things in the right broad buckets.  So if you've got additional information, please add it.

Comment 21 Richard Biener 2024-03-26 10:42:53 UTC

Martin, please work on this together with Honza (might be an actual duplicate).

Comment 22 Martin Jambor 2024-03-28 08:59:20 UTC

Created attachment 57828 [details]
Potential fix

I'm testing this patch

Comment 23 Jan Hubicka 2024-04-04 11:40:53 UTC

The patch looks reasonable.  We probably could hash the padding vectors at summary generation time to reduce WPA overhead, but that can be done incrementally next stage1.
I however wonder if we really guarantee to copy the paddings everywhere else then the total scalarization part?
(i.e. in all paths through the RTL expansion)

Comment 24 Martin Jambor 2024-04-04 21:22:49 UTC

(In reply to Jan Hubicka from comment #23)
> I however wonder if we really guarantee to copy the paddings everywhere else
> then the total scalarization part?
> (i.e. in all paths through the RTL expansion)

I wanted that we sometimes don't do that in PR 80689 and the idea was
refused.  And as far as I can recall the code I don't think we do.

Anyway, I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6jzlc25db.fsf@virgil.suse.cz/T/#u

Comment 25 GCC Commits 2024-04-08 16:57:02 UTC

The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:1e3312a25a7b34d6e3f549273e1674c7114e4408

commit r14-9841-g1e3312a25a7b34d6e3f549273e1674c7114e4408
Author: Martin Jambor <mjambor@suse.cz>
Date:   Mon Apr 8 18:53:23 2024 +0200

    ICF&SRA: Make ICF and SRA agree on padding
    
    PR 113359 shows that (at least with -fno-strict-aliasing) ICF can
    unify two functions which copy an aggregate type of the same size but
    then SRA, through its total scalarization, can copy the aggregate by
    pieces, skipping paddding, but the padding was not the same in the two
    original functions that ICF unified.
    
    This patch enhances SRA with the ability to collect padding
    information which then can be compared from within ICF.  Unfortunately
    SRA uses OPTION_SET_P when determining its limits, so ICF needs to
    switch cfuns at least once to figure it out too.
    
    gcc/ChangeLog:
    
    2024-03-27  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113359
            * ipa-icf-gimple.h (func_checker): New members
            safe_for_total_scalarization_p, m_total_scalarization_limit_known_p
            and m_total_scalarization_limit.
            (func_checker::func_checker): Initialize new member variables.
            * ipa-icf-gimple.cc: Include tree-sra.h.
            (func_checker::func_checker): Initialize new member variables.
            (func_checker::safe_for_total_scalarization_p): New function.
            (func_checker::compare_operand): Use the new function.
            * tree-sra.h (sra_get_max_scalarization_size): Declare.
            (sra_total_scalarization_would_copy_same_data_p): Likewise.
            * tree-sra.cc (prepare_iteration_over_array_elts): New function.
            (class sra_padding_collecting): New.
            (sra_padding_collecting::record_padding): Likewise.
            (scalarizable_type_p): Rename to totally_scalarizable_type_p.  Add
            ability to record padding when requested.
            (totally_scalarize_subtree): Split out gathering information necessary
            to iterate over array elements to prepare_iteration_over_array_elts.
            Fix errornous early exit.
            (analyze_all_variable_accesses): Adjust the call to
            totally_scalarizable_type_p.  Move determining of total scalariation
            size limit...
            (sra_get_max_scalarization_size): ...here.
            (check_ts_and_push_padding_to_vec): New function.
            (sra_total_scalarization_would_copy_same_data_p): Likewise.
    
    gcc/testsuite/ChangeLog:
    
    2024-03-27  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113359
            * gcc.dg/lto/pr113359-1_0.c: New.
            * gcc.dg/lto/pr113359-1_1.c: Likewise.
            * gcc.dg/lto/pr113359-2_0.c: Likewise.
            * gcc.dg/lto/pr113359-2_1.c: Likewise.
            * gcc.dg/lto/pr113359-3_0.c: Likewise.
            * gcc.dg/lto/pr113359-3_1.c: Likewise.
            * gcc.dg/lto/pr113359-4_0.c: Likewise.
            * gcc.dg/lto/pr113359-4_1.c: Likewise.
            * gcc.dg/lto/pr113359-5_0.c: Likewise.
            * gcc.dg/lto/pr113359-5_1.c: Likewise.

Comment 26 Martin Jambor 2024-04-08 16:59:57 UTC

This should be fixed on master, I'll backport the fix in a few weeks to at least gcc-13 where it was reported.

Comment 27 Jakub Jelinek 2024-04-08 17:01:47 UTC

Comment 28 GCC Commits 2024-05-14 11:22:46 UTC

The releases/gcc-13 branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:10bf53a80eefa46500bffb442719777e2640e7d7

commit r13-8773-g10bf53a80eefa46500bffb442719777e2640e7d7
Author: Martin Jambor <mjambor@suse.cz>
Date:   Mon Apr 8 18:53:23 2024 +0200

    ICF&SRA: Make ICF and SRA agree on padding
    
    PR 113359 shows that (at least with -fno-strict-aliasing) ICF can
    unify two functions which copy an aggregate type of the same size but
    then SRA, through its total scalarization, can copy the aggregate by
    pieces, skipping paddding, but the padding was not the same in the two
    original functions that ICF unified.
    
    This patch enhances SRA with the ability to collect padding
    information which then can be compared from within ICF.  Unfortunately
    SRA uses OPTION_SET_P when determining its limits, so ICF needs to
    switch cfuns at least once to figure it out too.
    
    gcc/ChangeLog:
    
    2024-03-27  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113359
            * ipa-icf-gimple.h (func_checker): New members
            safe_for_total_scalarization_p, m_total_scalarization_limit_known_p
            and m_total_scalarization_limit.
            (func_checker::func_checker): Initialize new member variables.
            * ipa-icf-gimple.cc: Include tree-sra.h.
            (func_checker::func_checker): Initialize new member variables.
            (func_checker::safe_for_total_scalarization_p): New function.
            (func_checker::compare_operand): Use the new function.
            * tree-sra.h (sra_get_max_scalarization_size): Declare.
            (sra_total_scalarization_would_copy_same_data_p): Likewise.
            * tree-sra.cc (prepare_iteration_over_array_elts): New function.
            (class sra_padding_collecting): New.
            (sra_padding_collecting::record_padding): Likewise.
            (scalarizable_type_p): Rename to totally_scalarizable_type_p.  Add
            ability to record padding when requested.
            (totally_scalarize_subtree): Split out gathering information necessary
            to iterate over array elements to prepare_iteration_over_array_elts.
            Fix errornous early exit.
            (analyze_all_variable_accesses): Adjust the call to
            totally_scalarizable_type_p.  Move determining of total scalariation
            size limit...
            (sra_get_max_scalarization_size): ...here.
            (check_ts_and_push_padding_to_vec): New function.
            (sra_total_scalarization_would_copy_same_data_p): Likewise.
    
    gcc/testsuite/ChangeLog:
    
    2024-03-27  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113359
            * gcc.dg/lto/pr113359-1_0.c: New.
            * gcc.dg/lto/pr113359-1_1.c: Likewise.
            * gcc.dg/lto/pr113359-2_0.c: Likewise.
            * gcc.dg/lto/pr113359-2_1.c: Likewise.
            * gcc.dg/lto/pr113359-3_0.c: Likewise.
            * gcc.dg/lto/pr113359-3_1.c: Likewise.
            * gcc.dg/lto/pr113359-4_0.c: Likewise.
            * gcc.dg/lto/pr113359-4_1.c: Likewise.
            * gcc.dg/lto/pr113359-5_0.c: Likewise.
            * gcc.dg/lto/pr113359-5_1.c: Likewise.
    
    (cherry picked from commit 1e3312a25a7b34d6e3f549273e1674c7114e4408)

Comment 29 Martin Jambor 2024-05-14 11:58:06 UTC

Fixed

Comment 30 Martin Jambor 2024-05-14 12:00:51 UTC

...so set to fixed as well.

Comment 31 Marc Poulhiès 2024-05-17 14:29:59 UTC

Hello Martin,

Any chance the fix that fixes the new test for 32bits can be also backported?

4923ed49b93352bcf9e43cafac38345e4a54c3f8
https://gcc.gnu.org/g:4923ed49b93352bcf9e43cafac38345e4a54c3f8

Not sure why it's not tagged so that it would appear here.

Comment 32 Martin Jambor 2024-05-20 15:10:12 UTC

(In reply to Marc Poulhiès from comment #31)
> Hello Martin,
> 
> Any chance the fix that fixes the new test for 32bits can be also backported?
> 
> 4923ed49b93352bcf9e43cafac38345e4a54c3f8
> https://gcc.gnu.org/g:4923ed49b93352bcf9e43cafac38345e4a54c3f8
> 
> Not sure why it's not tagged so that it would appear here.

My apologies for not including this commit, I completely forgot about it.  Unfortunately I'm afraid it will have to wait until after the 13.3 release, but I will backport it quickly afterwards.  Sorry again.

Comment 33 Marc Poulhiès 2024-05-20 15:13:49 UTC

No worries, we've applied it locally.
Thanks!