Bug 79675 - Pointless reg1 <- reg2; reg2 <- reg1 moves inside loop
Summary: Pointless reg1 <- reg2; reg2 <- reg1 moves inside loop
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: 9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks:
 
Reported: 2017-02-22 12:17 UTC by ktkachov
Modified: 2021-08-07 06:17 UTC (History)
0 users

See Also:
Host:
Target: aarch64
Build:
Known to work:
Known to fail: 4.9.4, 5.4.1, 6.1.0, 7.0
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ktkachov 2017-02-22 12:17:12 UTC
I'm looking at the code:
char *
foo (char *s1, char *s2, unsigned int n)
{
  char c1, c2;

  c2 = *(char *)s2++;

  do
    {
      do
        {
          c1 = *s1++;
          if (c1 == 0)
            return 0;
        }
      while (c1 != c2);
    }
  while (__builtin_strncmp (s1, (char *)s2, n) != 0);

  return --s1;
}

On aarch64 at -O2 this generates some bad code for the inner loop:
...
.L9:
        cmp     w21, w1
        beq     .L8
.L4:
        mov     x20, x19 // (1)
        mov     x19, x20 // (2)
        ldrb    w1, [x19], 1 (3)
        cbnz    w1, .L9
...
Note the moves between x20 and x19, (1) and (2)
Looking at the RTL dumps the two moves were in separate basic blocks but were moved together at some point.

Instructions (2) and (3) were transformed by autoincdec from:
add x19, x20, #1
ldrb w1, [x19, #-1]

but nothing eliminated (2) because x19 is later used as an argument to strncmp and in another basic block.

Doesn't the peephole2 pass try to eliminate such back-to-back copies?
Comment 1 ktkachov 2017-02-22 13:40:10 UTC
I think ivopts also contributes here.
Before ivopts the memory access and address are:
  s1_10 = s1_3 + 1;
  c1_11 = *s1_3;

but ivopts transforms them to:
  s1_10 = s1_3 + 1;
  c1_11 = MEM[base: s1_10, offset: -1B];

So the memory address is now dependent on s1_10 rather than s1_3 and introduces an offset where there was none before
Comment 2 bin cheng 2017-02-22 13:46:13 UTC
(In reply to ktkachov from comment #1)
> I think ivopts also contributes here.
> Before ivopts the memory access and address are:
>   s1_10 = s1_3 + 1;
>   c1_11 = *s1_3;
> 
> but ivopts transforms them to:
>   s1_10 = s1_3 + 1;
>   c1_11 = MEM[base: s1_10, offset: -1B];
> 
> So the memory address is now dependent on s1_10 rather than s1_3 and
> introduces an offset where there was none before

Thanks for reporting, I will have a look.  Thanks.
Comment 3 Andrew Pinski 2021-08-07 06:17:16 UTC
On the 9.3+ we get:
.L4:
        ldrb    w0, [x19], 1
        cbz     w0, .L6
.L3:
        sub     x20, x19, #1
        cmp     w21, w0
        bne     .L4
        mov     x2, x23
        mov     x1, x22
        mov     x0, x19
        bl      strncmp
        cbnz    w0, .L4

GCC 7 and 8 got us:
.L9:
        beq     .L8
.L4:
        mov     x20, x19
        mov     x19, x20
        ldrb    w1, [x19], 1
        cmp     w21, w1
        cbnz    w1, .L9
...
.L8:
        .cfi_restore_state
        mov     x2, x22
        mov     x1, x23
        mov     x0, x19
        bl      strncmp
        cbnz    w0, .L4

Which had the moves.

GCC 9+ ch does:
Analyzing loop 1
Loop 1 is not do-while loop: latch has multiple predecessors.
    Will duplicate bb 7
  Not duplicating bb 1: both sucessors are in loop.
Duplicating header of the loop 1 up to edge 7->4, 4 insns.
Loop 1 is do-while loop
Loop 1 is now do-while loop.

While 8- ch has:
Analyzing loop 1
Loop 1 is do-while loop

Which means this was fixed by r9-23.