Consider the minimized source code from libstdc++ ``` struct string { unsigned long _M_string_length; enum { _S_local_capacity = 15 }; char _M_local_buf[_S_local_capacity + 1]; }; string copy(const string& __str) noexcept { string result; if (__str._M_string_length > __str._S_local_capacity) __builtin_unreachable(); result._M_string_length = __str._M_string_length; __builtin_memcpy(result._M_local_buf, __str._M_local_buf, __str._M_string_length + 1); return result; } ``` Right now GCC with -O2 emits a long assembly with ~50 instructions https://godbolt.org/z/a89bh17hd However, note that * the `result._M_local_buf` is uninitialized, * there's at most 16 bytes to copy to `result._M_local_buf` which is of size 16 bytes So the compiler could optimize the code to always copy 16 bytes. The behavior change is not observable by user as the uninitialized bytes could contain any data, including the same bytes as `_str._M_local_buf`. As a result of always copying 16 bytes, the assembly becomes more than 7 times shorter, conditional jumps go away: https://godbolt.org/z/r5GPYTs4Y
# RANGE [irange] long unsigned int [1, 16] MASK 0x1f VALUE 0x0 _2 = _1 + 1; # PT = nonlocal _3 = &__str_5(D)->_M_local_bufD.4676; # .MEM_7 = VDEF <.MEM_6> memcpyD.1403 (&<retval>._M_local_bufD.4676, _3, _2); The range information is there already for _2. Note the hugely expanded out instructions is a target issue though.
Note copying more will likely trigger valgrind complaints accessing uninitialized memory? Technically it also makes the IL to invoke undefined behavior, if we'd expand this to byte-by-byte copies with registers. So I'm not sure this is a good idea.