Test case `-O1 -march=skylake-avx512`: int main() { double mem[16]; using V [[gnu::vector_size(64)]] = double; const V a{0, 1, 2, 3, 4, 5, 6, 7}; const V b{8, 9, 10, 11, 12, 13, 14, 15}; __builtin_memcpy(mem, &a, 64); __builtin_memcpy(mem + 8, &b, 64); V c = {}; __builtin_memcpy(&c, mem + 4, 64); if (c[5] != double(9)) __builtin_abort(); } From my extended test case, where c would be {4, 5, 6, 7, 8, 15, 9, 10}. If GCC is made to forget the contents of mem (e.g. inline-asm), the test does not fail. GCC9 does not do constant evaluation of the code above and therefore doesn't fail.
Confirmed, started with r10-6809-g7f5617b00445dcc8. I can reproduce that on Intel SDE simulator: $ g++ pr94300.c -O1 -march=skylake-avx512 && /home/marxin/Programming/intel-sde-new/sde-external-8.16.0-2018-01-30-lin/sde -skx -- ./a.out Aborted (core dumped)
Created attachment 48105 [details] gcc10-pr94300.patch Untested fix.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:f1154b4d3c54e83d493cc66d1a30c410b9b3108a commit r10-7366-gf1154b4d3c54e83d493cc66d1a30c410b9b3108a Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Mar 25 09:17:01 2020 +0100 sccvn: Fix buffer overflow in push_partial_def [PR94300] The following testcase is miscompiled, because there is a buffer overflow in push_partial_def in the little-endian case when working 64-byte vectors. The code computes the number of bytes we need in the BUFFER: NEEDED_LEN, which is rounded up number of bits we need. Then the code native_encode_expr each (partially overlapping) pd into THIS_BUFFER. If pd.offset < 0, i.e. the pd.rhs store starts at some bits before the window we are interested in, we pass -pd.offset to native_encode_expr and shrink the size already earlier: HOST_WIDE_INT size = pd.size; if (pd.offset < 0) size -= ROUND_DOWN (-pd.offset, BITS_PER_UNIT); On this testcase, the problem is with a store with pd.offset > 0, in particular pd.offset 256, pd.size 512, i.e. a 64-byte store which doesn't fit into entirely into BUFFER. We have just: size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT); in this case for little-endian, which isn't sufficient, because needed_len is 64, the entire BUFFER (except of the last extra byte used for shifting). native_encode_expr fills the whole THIS_BUFFER (again, except the last extra byte), and the code then performs memcpy (BUFFER + 32, THIS_BUFFER, 64); which overflows BUFFER and as THIS_BUFFER is usually laid out after it, overflows it into THIS_BUFFER. The following patch fixes it by for pd.offset > 0 making sure size is reduced too. For big-endian the code does things differently and already handles this right. 2020-03-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/94300 * tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): If pd.offset is positive, make sure that off + size isn't larger than needed_len. * gcc.target/i386/avx512f-pr94300.c: New test.
Fixed.