Here is some simple code from a real application which converts one struct to a similar but larger one: #include <cstdint> struct S64 { uint64_t a; int8_t b; int8_t c; uint16_t d; }; struct S32 { uint32_t a; int8_t b; int8_t c; uint16_t d; S64 To64() const; }; S64 S32::To64() const { return S64{a, b, c, d}; } GCC 13 and earlier emitted good code for this (as does Clang): S32::To64() const: mov eax, DWORD PTR [rdi] mov edx, DWORD PTR [rdi+4] ret GCC 14 is much worse: S32::To64() const: xor edx, edx mov esi, DWORD PTR [rdi+4] mov eax, DWORD PTR [rdi] movabs rdi, -4294967296 mov rcx, rdx and rcx, rdi or rcx, rsi mov rdx, rcx ret Demo: https://godbolt.org/z/YbenMeEPq
Looks like the padding is being zeroed. Are you sure that is zeroing the padding here is what is expected of this. Code?
Confirmed. Probably the TImode enhancements.
(In reply to Richard Biener from comment #2) > Confirmed. Probably the TImode enhancements. Actually you might be right since the IR from the gimple level is the same between GCC 13 and 14. And aarch64 code generation didn't change.
Doing a binary search reveals that the patch that causes the issue is https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623766.html . Looking at the optimization process, the working is roughly the following: Before the change, the expand pass generates code that uses (set (subreg ...)). Then the split pass splits the register that is being used in subreg into two registers, then the combine pass do "constant propagation" and simplifies the code. After the change, the expand pass generates code that uses (set reg (ior (and reg 0xffff0000) value)). Thus the split pass cannot split the register (it does not know how to deal with "and", and the combine pass is similarly stuck.
I think this can be resolved by implementing some code in combine.cc to replace: (set:TI (reg:TI 101) (zero_extend:TI (...:DI ...))) (set:DI (reg:DI ...) (subreg:DI (reg:TI 101) 8)) with (set:TI (reg:TI 101) (zero_extend:TI (...:DI ...))) (set:DI (reg:DI ...) (const_int 0)) This certainly is always an improvement, because 0 is simpler than extracting the subregister. After that, a few other passes of combine rescanning should be able to constant fold the 0 forward. Unfortunately, I don't know how to modify combine.cc or some other files to handle this pattern. Can anyone give a suggestion? (maybe add a define_peephole2 in common.md ? )