Bug 92714 - [missed-optimization] aggregate initialization of an array fills the whole array with zeros first, including leading non-zero elements
Summary: [missed-optimization] aggregate initialization of an array fills the whole ar...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 8.1.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2019-11-28 18:39 UTC by Lassie Darkorbit
Modified: 2021-12-22 10:06 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-11-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lassie Darkorbit 2019-11-28 18:39:08 UTC
void *sink;
void bar() {
    int a[100]{1,2,3,4};
    sink = a;             // a escapes the function
    asm("":::"memory");   // and compiler memory barrier
    // forces the compiler to materialize a[] in memory instead of optimizing away
}

gcc 8.1 and gcc 9.2 both make asm like this (even with -O3):

bar():
    push    edi                       # save call-preserved EDI which rep stos uses
    xor     eax, eax                  # eax=0
    mov     ecx, 100                  # repeat-count = 100
    sub     esp, 400                  # reserve 400 bytes on the stack
    mov     edi, esp                  # dst for rep stos
        mov     DWORD PTR sink, esp       # sink = a
    rep stosd                         # memset(a, 0, 400) 

    mov     DWORD PTR [esp], 1        # then store the non-zero initializers
    mov     DWORD PTR [esp+4], 2      # over the zeroed part of the array
    mov     DWORD PTR [esp+8], 3
    mov     DWORD PTR [esp+12], 4

    add     esp, 400                  # cleanup the stack
    pop     edi                       # and restore caller's EDI
    ret
Comment 1 Richard Biener 2019-11-28 19:59:21 UTC
It's actually an optimization - it's cheaper to clear the whole object if most
of it is zero.  What we miss is to notice the special-case of only the tail
being zeros.

Jeff added memset pruning to DSE but this case has

  <bb 2> [local count: 1073741824]:
  a = {};
  MEM <unsigned long> [(int *)&a] = 8589934593;
  MEM <unsigned long> [(int *)&a + 8B] = 17179869187;
  sink = &a;

the other obvious place to fix it is in the gimplifier of course which
creates the above code in the first place.

The same issue happens with

void *sink;
void bar() {
    int a[100] = { [96]=1,2,3,4};
    sink = a;             // a escapes the function
    asm("":::"memory");   // and compiler memory barrier
    // forces the compiler to materialize a[] in memory instead of optimizing away
}

or

void *sink;
void bar() {
    int a[100] = { 1,2,3,4,[96]=1,2,3,4};
    sink = a;             // a escapes the function
    asm("":::"memory");   // and compiler memory barrier
    // forces the compiler to materialize a[] in memory instead of optimizing away
}

though the trailing zeros are probably the most common case.
Comment 2 Jakub Jelinek 2019-11-29 09:01:15 UTC
Or in theory store-merging could do this, although it can handle only the = {} and not memset right now.  That said, just unconditionally adjusting the clearing store not to cover the boundaries which are overwritten is not a good idea, it is much faster to start with an 8 or 16 byte aligned address over skipping 3 or 7 bytes there.