92714 – [missed-optimization] aggregate initialization of an array fills the whole array with zeros first, including leading non-zero elements

Bug 92714 - [missed-optimization] aggregate initialization of an array fills the whole array with zeros first, including leading non-zero elements

Summary: [missed-optimization] aggregate initialization of an array fills the whole ar...

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	8.1.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2019-11-28 18:39 UTC by Lassie Darkorbit
Modified:	2021-12-22 10:06 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2019-11-28 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Lassie Darkorbit 2019-11-28 18:39:08 UTC

void *sink;
void bar() {
    int a[100]{1,2,3,4};
    sink = a;             // a escapes the function
    asm("":::"memory");   // and compiler memory barrier
    // forces the compiler to materialize a[] in memory instead of optimizing away
}

gcc 8.1 and gcc 9.2 both make asm like this (even with -O3):

bar():
    push    edi                       # save call-preserved EDI which rep stos uses
    xor     eax, eax                  # eax=0
    mov     ecx, 100                  # repeat-count = 100
    sub     esp, 400                  # reserve 400 bytes on the stack
    mov     edi, esp                  # dst for rep stos
        mov     DWORD PTR sink, esp       # sink = a
    rep stosd                         # memset(a, 0, 400) 

    mov     DWORD PTR [esp], 1        # then store the non-zero initializers
    mov     DWORD PTR [esp+4], 2      # over the zeroed part of the array
    mov     DWORD PTR [esp+8], 3
    mov     DWORD PTR [esp+12], 4

    add     esp, 400                  # cleanup the stack
    pop     edi                       # and restore caller's EDI
    ret

Comment 1 Richard Biener 2019-11-28 19:59:21 UTC

It's actually an optimization - it's cheaper to clear the whole object if most
of it is zero.  What we miss is to notice the special-case of only the tail
being zeros.

Jeff added memset pruning to DSE but this case has

  <bb 2> [local count: 1073741824]:
  a = {};
  MEM <unsigned long> [(int *)&a] = 8589934593;
  MEM <unsigned long> [(int *)&a + 8B] = 17179869187;
  sink = &a;

the other obvious place to fix it is in the gimplifier of course which
creates the above code in the first place.

The same issue happens with

void *sink;
void bar() {
    int a[100] = { [96]=1,2,3,4};
    sink = a;             // a escapes the function
    asm("":::"memory");   // and compiler memory barrier
    // forces the compiler to materialize a[] in memory instead of optimizing away
}

or

void *sink;
void bar() {
    int a[100] = { 1,2,3,4,[96]=1,2,3,4};
    sink = a;             // a escapes the function
    asm("":::"memory");   // and compiler memory barrier
    // forces the compiler to materialize a[] in memory instead of optimizing away
}

though the trailing zeros are probably the most common case.

Comment 2 Jakub Jelinek 2019-11-29 09:01:15 UTC

Or in theory store-merging could do this, although it can handle only the = {} and not memset right now.  That said, just unconditionally adjusting the clearing store not to cover the boundaries which are overwritten is not a good idea, it is much faster to start with an 8 or 16 byte aligned address over skipping 3 or 7 bytes there.