For cortex-m4 -Os, GCC10 produces bigger assembly code than GCC7 when memset is called. Here is the C code example to trigger the regression: ```C #include <stdio.h> #include <string.h> struct foo_t { int a; int b; int c; int d; }; /* Random function modifying foo with another value than 0 */ void doStuff(struct foo_t *foo) { foo->b = foo->a + foo->c; } void twoLinesFunction(struct foo_t *foo) { /* R0 is saved in GCC10 but not in GCC7 */ memset(foo, 0x00, sizeof(struct foo_t)); doStuff(foo); } int main(void) { struct foo_t foo; twoLinesFunction(&foo); return 0; } ``` compile command: `gcc -Os -mcpu=cortex-m4` GCC7.3.1 produces: ```asm <twoLinesFunction>: push {r3, lr} movs r2, #16 movs r1, #0 bl 8168 <memset> ldmia.w sp!, {r3, lr} b.w 8104 <doStuff> ``` While GCC10.3.0 produces: ```asm <twoLinesFunction>: push {r4, lr} movs r2, #16 mov r4, r0 --> backup r0 movs r1, #0 bl 8174 <memset> mov r0, r4 --> restore r0 ldmia.w sp!, {r4, lr} b.w 810c <doStuff> ``` Main function remains the same. The builtin memset function does not change R0 so there is no need to save it and restore it later. GCC7 is more efficient. GCC10 should not backup R0 for this builtin function in this case, it produces slower code. There is this PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 which is also referring to this behavior with a patch to implement the optimization but I'm not sure when this optimization has been wiped out.
The RA used to know how to rematerialize a register from the memset return value. Sth made that no longer work it seems.
On x86_64 for some reason it works in GCC 9+; tested with " -Os -mstringop-strategy=libcall" and adding noinline to doStuff just to force to make sure it does not inline. Anyways confirmed, worked in GCC 8.2.0 for both aarch64 and arm but does not work in GCC 9.
REG_RETURNED is no longer there. Looks like there is an extra move which caused the code to be different. For aarch64 we have: (insn 18 4 2 2 (set (reg:DI 94) (reg:DI 0 x0 [ foo ])) "/app/example.cpp":17:42 47 {*movdi_aarch64} (nil)) (insn 2 18 3 2 (set (reg/v/f:DI 90 [ foo ]) (reg:DI 94)) "/app/example.cpp":17:42 47 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 94) (nil))) Before the call to memset.
Note I suspect it is r9-3594 is what makes the difference here. Also this is just very fragile really.
For reference, it's find_call_crossed_cheap_reg finding (or not) such candidate.
Really started with r9-3594.
I don't see any problem with aarch64 fwiw. If RA decides it does not want to tie the new pseudo to the argument register, it may have a reason for it? Or suboptimal heuristics. What is this REG_RETURNED thing?
Also, what is fragile here? This is *removing* fragility and premature choices!
(In reply to Segher Boessenkool from comment #7) > What is this REG_RETURNED thing? Ah, something added in ira-lives.c, and you call *that* code fragile? I agree :-)
>I don't see any problem with aarch64 fwiw. I have to try it on aarch64 but it failed there at one point. (In reply to Segher Boessenkool from comment #8) > Also, what is fragile here? This is *removing* fragility and premature > choices! (In reply to Segher Boessenkool from comment #9) > (In reply to Segher Boessenkool from comment #7) > > What is this REG_RETURNED thing? > > Ah, something added in ira-lives.c, and you call *that* code fragile? > I agree :-) You got to what I thought was fragile in the end. And yes the whole REG_RETURNED mechanism seems very fragile (not the change r9-3594 ). I could not figure out how it was being known to be set on which function call even.
Yeah, that could be much more robust. OTOH this stuff is completely opportunistic in the first place: it handles only function return values, not any other hard registers (like local register vars).
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.