Bug 101995 - [11/12/13/14 Regression] regression built-in memset missed-optimization arm -Os since r9-3594
Summary: [11/12/13/14 Regression] regression built-in memset missed-optimization arm -...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 10.3.1
: P2 normal
Target Milestone: 11.5
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks:
 
Reported: 2021-08-20 09:35 UTC by Thibaut M.
Modified: 2023-07-07 10:40 UTC (History)
3 users (show)

See Also:
Host:
Target: arm, aarch64
Build:
Known to work: 7.3.1, 8.2.0
Known to fail: 10.3.1, 9.1.0
Last reconfirmed: 2021-08-21 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thibaut M. 2021-08-20 09:35:53 UTC
For cortex-m4 -Os, GCC10 produces bigger assembly code than GCC7 when memset is called.

Here is the C code example to trigger the regression:

```C
#include <stdio.h>
#include <string.h>

struct foo_t {
  int a;
  int b;
  int c;
  int d;
};

/* Random function modifying foo with another value than 0 */
void doStuff(struct foo_t *foo) {
  foo->b = foo->a + foo->c;
}

void twoLinesFunction(struct foo_t *foo) {
  /* R0 is saved in GCC10 but not in GCC7 */
  memset(foo, 0x00, sizeof(struct foo_t));
  doStuff(foo);
}

int main(void) {
  struct foo_t foo;
  twoLinesFunction(&foo);
  return 0;
}
```

compile command: `gcc -Os -mcpu=cortex-m4`

GCC7.3.1 produces:
```asm
<twoLinesFunction>:
    push    {r3, lr}
    movs    r2, #16
    movs    r1, #0
    bl      8168 <memset>
    ldmia.w sp!, {r3, lr}
    b.w     8104 <doStuff>
```

While GCC10.3.0 produces:
```asm
<twoLinesFunction>:
    push    {r4, lr}
    movs    r2, #16
    mov     r4, r0        --> backup r0
    movs    r1, #0
    bl      8174 <memset>
    mov     r0, r4        --> restore r0
    ldmia.w sp!, {r4, lr}
    b.w     810c <doStuff>
```

Main function remains the same.

The builtin memset function does not change R0 so there is no need to save it and restore it later. GCC7 is more efficient.
GCC10 should not backup R0 for this builtin function in this case, it produces slower code.

There is this PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 which is also referring to this behavior with a patch to implement the optimization but I'm not sure when this optimization has been wiped out.
Comment 1 Richard Biener 2021-08-20 12:21:45 UTC
The RA used to know how to rematerialize a register from the memset return value.
Sth made that no longer work it seems.
Comment 2 Andrew Pinski 2021-08-21 02:53:36 UTC
On x86_64 for some reason it works in GCC 9+; tested with " -Os -mstringop-strategy=libcall" and adding noinline to doStuff just to force to make sure it does not inline.

Anyways confirmed, worked in GCC 8.2.0 for both aarch64 and arm but does not work in GCC 9.
Comment 3 Andrew Pinski 2021-08-21 03:17:26 UTC
REG_RETURNED is no longer there.

Looks like there is an extra move which caused the code to be different.

For aarch64 we have:
(insn 18 4 2 2 (set (reg:DI 94)
        (reg:DI 0 x0 [ foo ])) "/app/example.cpp":17:42 47 {*movdi_aarch64}
     (nil))
(insn 2 18 3 2 (set (reg/v/f:DI 90 [ foo ])
        (reg:DI 94)) "/app/example.cpp":17:42 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 94)
        (nil)))
Before the call to memset.
Comment 4 Andrew Pinski 2021-08-21 03:26:02 UTC
Note I suspect it is r9-3594 is what makes the difference here. Also this is just very fragile really.
Comment 5 Richard Biener 2021-08-23 09:10:13 UTC
For reference, it's find_call_crossed_cheap_reg finding (or not) such candidate.
Comment 6 Martin Liška 2021-12-08 14:10:22 UTC
Really started with r9-3594.
Comment 7 Segher Boessenkool 2021-12-08 20:42:49 UTC
I don't see any problem with aarch64 fwiw.

If RA decides it does not want to tie the new pseudo to the argument
register, it may have a reason for it?  Or suboptimal heuristics.

What is this REG_RETURNED thing?
Comment 8 Segher Boessenkool 2021-12-08 20:44:07 UTC
Also, what is fragile here?  This is *removing* fragility and premature
choices!
Comment 9 Segher Boessenkool 2021-12-08 20:50:21 UTC
(In reply to Segher Boessenkool from comment #7)
> What is this REG_RETURNED thing?

Ah, something added in ira-lives.c, and you call *that* code fragile?
I agree :-)
Comment 10 Andrew Pinski 2021-12-09 08:18:35 UTC
>I don't see any problem with aarch64 fwiw.
I have to try it on aarch64 but it failed there at one point.

(In reply to Segher Boessenkool from comment #8)
> Also, what is fragile here?  This is *removing* fragility and premature
> choices!

(In reply to Segher Boessenkool from comment #9)
> (In reply to Segher Boessenkool from comment #7)
> > What is this REG_RETURNED thing?
> 
> Ah, something added in ira-lives.c, and you call *that* code fragile?
> I agree :-)

You got to what I thought was fragile in the end. And yes the whole REG_RETURNED mechanism seems very fragile (not the change r9-3594 ). I could not figure out how it was being known to be set on which function call even.
Comment 11 Segher Boessenkool 2021-12-09 21:08:39 UTC
Yeah, that could be much more robust.

OTOH this stuff is completely opportunistic in the first place: it handles
only function return values, not any other hard registers (like local
register vars).
Comment 12 Richard Biener 2022-05-27 09:46:08 UTC
GCC 9 branch is being closed
Comment 13 Jakub Jelinek 2022-06-28 10:46:09 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 14 Richard Biener 2023-07-07 10:40:48 UTC
GCC 10 branch is being closed.