Bug 112683 - Optimizing memcpy range by extending to word bounds
Summary: Optimizing memcpy range by extending to word bounds
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2023-11-23 14:41 UTC by Antony Polukhin
Modified: 2023-11-24 08:01 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Antony Polukhin 2023-11-23 14:41:21 UTC
Consider the minimized source code from libstdc++

```
struct string {
    unsigned long _M_string_length;
    enum { _S_local_capacity = 15 };
    char _M_local_buf[_S_local_capacity + 1];
};

string copy(const string& __str) noexcept {
    string result;

    if (__str._M_string_length > __str._S_local_capacity)
        __builtin_unreachable();

    result._M_string_length = __str._M_string_length;
    __builtin_memcpy(result._M_local_buf, __str._M_local_buf,
                     __str._M_string_length + 1);

    return result;
}
```

Right now GCC with -O2 emits a long assembly with ~50 instructions https://godbolt.org/z/a89bh17hd

However, note that
* the `result._M_local_buf` is uninitialized,
* there's at most 16 bytes to copy to `result._M_local_buf` which is of size 16 bytes

So the compiler could optimize the code to always copy 16 bytes. The behavior change is not observable by user as the uninitialized bytes could contain any data, including the same bytes as `_str._M_local_buf`.

As a result of always copying 16 bytes, the assembly becomes more than 7 times shorter, conditional jumps go away: https://godbolt.org/z/r5GPYTs4Y
Comment 1 Andrew Pinski 2023-11-23 21:21:40 UTC
  # RANGE [irange] long unsigned int [1, 16] MASK 0x1f VALUE 0x0
  _2 = _1 + 1;
  # PT = nonlocal 
  _3 = &__str_5(D)->_M_local_bufD.4676;
  # .MEM_7 = VDEF <.MEM_6>
  memcpyD.1403 (&<retval>._M_local_bufD.4676, _3, _2);

The range information is there already for _2.


Note the hugely expanded out instructions is a target issue though.
Comment 2 Richard Biener 2023-11-24 08:01:31 UTC
Note copying more will likely trigger valgrind complaints accessing uninitialized memory?  Technically it also makes the IL to invoke
undefined behavior, if we'd expand this to byte-by-byte copies with
registers.

So I'm not sure this is a good idea.