115576 – [14/15 regression] Worse code generated for simple struct conversion since r14-2386-gbdf2737cda53a8

Bug 115576 - [14/15 regression] Worse code generated for simple struct conversion since r14-2386-gbdf2737cda53a8

Summary: [14/15 regression] Worse code generated for simple struct conversion since r1...

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	14.1.1

Importance:	P2 normal
Target Milestone:	14.2
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2024-06-21 07:44 UTC by John Zwinck
Modified:	2024-06-26 07:12 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:	x86_64--
Build:
Known to work:	13.3.0
Known to fail:	14.1.0, 15.0
Last reconfirmed:	2024-06-21 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Zwinck 2024-06-21 07:44:20 UTC

Here is some simple code from a real application which converts one struct to a similar but larger one:

    #include <cstdint>

    struct S64
    {
        uint64_t a;
        int8_t b;
        int8_t c;
        uint16_t d;
    };

    struct S32
    {
        uint32_t a;
        int8_t b;
        int8_t c;
        uint16_t d;

        S64 To64() const;
    };

    S64 S32::To64() const
    {
        return S64{a, b, c, d};
    }

GCC 13 and earlier emitted good code for this (as does Clang):

    S32::To64() const:
        mov     eax, DWORD PTR [rdi]
        mov     edx, DWORD PTR [rdi+4]
        ret

GCC 14 is much worse:

    S32::To64() const:
        xor     edx, edx
        mov     esi, DWORD PTR [rdi+4]
        mov     eax, DWORD PTR [rdi]
        movabs  rdi, -4294967296
        mov     rcx, rdx
        and     rcx, rdi
        or      rcx, rsi
        mov     rdx, rcx
        ret

Demo: https://godbolt.org/z/YbenMeEPq

Comment 1 Andrew Pinski 2024-06-21 07:49:05 UTC

Looks like the padding is being zeroed.

Are you sure that is zeroing the padding here is what is expected of this. Code?

Comment 2 Richard Biener 2024-06-21 07:52:42 UTC

Confirmed.  Probably the TImode enhancements.

Comment 3 Andrew Pinski 2024-06-21 17:44:46 UTC

(In reply to Richard Biener from comment #2)
> Confirmed.  Probably the TImode enhancements.

Actually you might be right since the IR from the gimple level is the same between GCC 13 and 14. And aarch64 code generation didn't change.

Comment 4 user202729 2024-06-25 08:17:34 UTC

Doing a binary search reveals that the patch that causes the issue is https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623766.html  .

Looking at the optimization process, the working is roughly the following:

Before the change, the expand pass generates code that uses (set (subreg ...)). Then the split pass splits the register that is being used in subreg into two registers, then the combine pass do "constant propagation" and simplifies the code.

After the change, the expand pass generates code that uses (set reg (ior (and reg 0xffff0000) value)). Thus the split pass cannot split the register (it does not know how to deal with "and", and the combine pass is similarly stuck.

Comment 5 user202729 2024-06-26 07:10:54 UTC

I think this can be resolved by implementing some code in combine.cc to replace:

(set:TI (reg:TI 101) (zero_extend:TI (...:DI ...)))
(set:DI (reg:DI ...) (subreg:DI (reg:TI 101) 8))

with

(set:TI (reg:TI 101) (zero_extend:TI (...:DI ...)))
(set:DI (reg:DI ...) (const_int 0))

This certainly is always an improvement, because 0 is simpler than extracting the subregister.

After that, a few other passes of combine rescanning should be able to constant fold the 0 forward.

Unfortunately, I don't know how to modify combine.cc or some other files to handle this pattern. Can anyone give a suggestion? (maybe add a define_peephole2 in common.md ? )