[Bug c/86680] New: possible gcc optimization

Thu Jul 26 10:51:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86680

            Bug ID: 86680
           Summary: possible gcc optimization
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: florian.laroche at googlemail dot com
  Target Milestone: ---

Created attachment 44444
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44444&action=edit
testcase

I can see this on x86_64 and aarch64. The first function is compiled with much
bigger code. Seems the alignment to 8 bytes and thus this multiple of 8
is forgotten in some optimization step.

best regards,

Florian La Roche

$ aarch64-linux-gnu-gcc-8 -O2 -c test.c
$ aarch64-linux-gnu-objdump -d test.o 

test.o:     Dateiformat elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <clear_bss1>:
   0:   90000001        adrp    x1, 0 <__bss_start1>
   4:   90000000        adrp    x0, 0 <__bss_end1>
   8:   f9400022        ldr     x2, [x1]
   c:   f9400000        ldr     x0, [x0]
  10:   eb00005f        cmp     x2, x0
  14:   54000142        b.cs    3c <clear_bss1+0x3c>  // b.hs, b.nlast
  18:   d1000401        sub     x1, x0, #0x1
  1c:   aa0203e0        mov     x0, x2
  20:   cb020021        sub     x1, x1, x2
  24:   927df021        and     x1, x1, #0xfffffffffffffff8
  28:   91002021        add     x1, x1, #0x8
  2c:   8b020021        add     x1, x1, x2
  30:   f800841f        str     xzr, [x0], #8
  34:   eb01001f        cmp     x0, x1
  38:   54ffffc1        b.ne    30 <clear_bss1+0x30>  // b.any
  3c:   d65f03c0        ret

0000000000000040 <clear_bss2>:
  40:   90000000        adrp    x0, 0 <__bss_start2>
  44:   90000001        adrp    x1, 0 <__bss_end2>
  48:   f9400000        ldr     x0, [x0]
  4c:   f9400021        ldr     x1, [x1]
  50:   f9400000        ldr     x0, [x0]
  54:   f9400021        ldr     x1, [x1]
  58:   eb01001f        cmp     x0, x1
  5c:   54000082        b.cs    6c <clear_bss2+0x2c>  // b.hs, b.nlast
  60:   f800841f        str     xzr, [x0], #8
  64:   eb01001f        cmp     x0, x1
  68:   54ffffc3        b.cc    60 <clear_bss2+0x20>  // b.lo, b.ul, b.last
  6c:   d65f03c0        ret

Please note how the second function is compiled much smaller. The first
function from "18" to "2c" should basically be optimized away.

Compiling with -Os is also much better:
$ aarch64-linux-gnu-gcc-8 -Os -c test.c
$ aarch64-linux-gnu-objdump -d test.o 

test.o:     Dateiformat elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <clear_bss1>:
   0:   90000000        adrp    x0, 0 <__bss_start1>
   4:   90000001        adrp    x1, 0 <__bss_end1>
   8:   f9400000        ldr     x0, [x0]
   c:   f9400021        ldr     x1, [x1]
  10:   eb01001f        cmp     x0, x1
  14:   54000043        b.cc    1c <clear_bss1+0x1c>  // b.lo, b.ul, b.last
  18:   d65f03c0        ret
  1c:   f800841f        str     xzr, [x0], #8
  20:   17fffffc        b       10 <clear_bss1+0x10>

0000000000000024 <clear_bss2>:
  24:   90000000        adrp    x0, 0 <__bss_start2>
  28:   90000001        adrp    x1, 0 <__bss_end2>
  2c:   f9400000        ldr     x0, [x0]
  30:   f9400021        ldr     x1, [x1]
  34:   f9400000        ldr     x0, [x0]
  38:   f9400021        ldr     x1, [x1]
  3c:   eb00003f        cmp     x1, x0
  40:   54000048        b.hi    48 <clear_bss2+0x24>  // b.pmore
  44:   d65f03c0        ret
  48:   f800841f        str     xzr, [x0], #8
  4c:   17fffffc        b       3c <clear_bss2+0x18>

The problem also shows up on x86_64 from "13" to "22":
$ gcc -O2 -c test.c
$ objdump -d test.o

test.o:     Dateiformat elf64-x86-64

Disassembly of section .text:

0000000000000000 <clear_bss1>:
   0:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 7
<clear_bss1+0x7>
   7:   48 8d 15 00 00 00 00    lea    0x0(%rip),%rdx        # e
<clear_bss1+0xe>
   e:   48 39 d0                cmp    %rdx,%rax
  11:   73 25                   jae    38 <clear_bss1+0x38>
  13:   48 8d 48 08             lea    0x8(%rax),%rcx
  17:   48 83 c2 07             add    $0x7,%rdx
  1b:   48 29 ca                sub    %rcx,%rdx
  1e:   48 83 e2 f8             and    $0xfffffffffffffff8,%rdx
  22:   48 01 ca                add    %rcx,%rdx
  25:   0f 1f 00                nopl   (%rax)
  28:   48 c7 00 00 00 00 00    movq   $0x0,(%rax)
  2f:   48 83 c0 08             add    $0x8,%rax
  33:   48 39 d0                cmp    %rdx,%rax
  36:   75 f0                   jne    28 <clear_bss1+0x28>
  38:   f3 c3                   repz retq 
  3a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

0000000000000040 <clear_bss2>:
  40:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 47
<clear_bss2+0x7>
  47:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 4e
<clear_bss2+0xe>
  4e:   48 39 d0                cmp    %rdx,%rax
  51:   73 16                   jae    69 <clear_bss2+0x29>
  53:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  58:   48 83 c0 08             add    $0x8,%rax
  5c:   48 c7 40 f8 00 00 00    movq   $0x0,-0x8(%rax)
  63:   00 
  64:   48 39 d0                cmp    %rdx,%rax
  67:   72 ef                   jb     58 <clear_bss2+0x18>
  69:   f3 c3                   repz retq