Bug 94300 - [10 Regression] memcpy vector load miscompiled during const-prop since r10-6809-g7f5617b00445dcc8
Summary: [10 Regression] memcpy vector load miscompiled during const-prop since r10-68...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P1 normal
Target Milestone: 10.0
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2020-03-24 12:48 UTC by Matthias Kretz (Vir)
Modified: 2020-03-25 08:22 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work: 9.3.0
Known to fail: 10.0
Last reconfirmed: 2020-03-24 00:00:00


Attachments
gcc10-pr94300.patch (815 bytes, patch)
2020-03-24 14:53 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Kretz (Vir) 2020-03-24 12:48:09 UTC
Test case `-O1 -march=skylake-avx512`:

int main()                                                                       
{                                                                                
  double mem[16];                                                                
  using V [[gnu::vector_size(64)]] = double;                                     
  const V a{0, 1, 2, 3, 4, 5, 6, 7};                                             
  const V b{8, 9, 10, 11, 12, 13, 14, 15};                                       
  __builtin_memcpy(mem, &a, 64);                                                 
  __builtin_memcpy(mem + 8, &b, 64);                                             
  V c = {};                                                                      
  __builtin_memcpy(&c, mem + 4, 64);                                             
  if (c[5] != double(9))                                                         
    __builtin_abort();                                                           
}

From my extended test case, where c would be {4, 5, 6, 7, 8, 15, 9, 10}. If GCC is made to forget the contents of mem (e.g. inline-asm), the test does not fail. GCC9 does not do constant evaluation of the code above and therefore doesn't fail.
Comment 1 Martin Liška 2020-03-24 13:17:52 UTC
Confirmed, started with r10-6809-g7f5617b00445dcc8.
I can reproduce that on Intel SDE simulator:

$ g++ pr94300.c -O1 -march=skylake-avx512 && /home/marxin/Programming/intel-sde-new/sde-external-8.16.0-2018-01-30-lin/sde -skx -- ./a.out 
Aborted (core dumped)
Comment 2 Jakub Jelinek 2020-03-24 14:53:53 UTC
Created attachment 48105 [details]
gcc10-pr94300.patch

Untested fix.
Comment 3 GCC Commits 2020-03-25 08:17:43 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f1154b4d3c54e83d493cc66d1a30c410b9b3108a

commit r10-7366-gf1154b4d3c54e83d493cc66d1a30c410b9b3108a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Mar 25 09:17:01 2020 +0100

    sccvn: Fix buffer overflow in push_partial_def [PR94300]
    
    The following testcase is miscompiled, because there is a buffer overflow
    in push_partial_def in the little-endian case when working 64-byte vectors.
    The code computes the number of bytes we need in the BUFFER: NEEDED_LEN,
    which is rounded up number of bits we need.  Then the code
    native_encode_expr each (partially overlapping) pd into THIS_BUFFER.
    If pd.offset < 0, i.e. the pd.rhs store starts at some bits before the
    window we are interested in, we pass -pd.offset to native_encode_expr and
    shrink the size already earlier:
          HOST_WIDE_INT size = pd.size;
          if (pd.offset < 0)
            size -= ROUND_DOWN (-pd.offset, BITS_PER_UNIT);
    On this testcase, the problem is with a store with pd.offset > 0,
    in particular pd.offset 256, pd.size 512, i.e. a 64-byte store which doesn't
    fit into entirely into BUFFER.
    We have just:
              size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
    in this case for little-endian, which isn't sufficient, because needed_len
    is 64, the entire BUFFER (except of the last extra byte used for shifting).
    native_encode_expr fills the whole THIS_BUFFER (again, except the last extra
    byte), and the code then performs memcpy (BUFFER + 32, THIS_BUFFER, 64);
    which overflows BUFFER and as THIS_BUFFER is usually laid out after it,
    overflows it into THIS_BUFFER.
    The following patch fixes it by for pd.offset > 0 making sure size is
    reduced too.  For big-endian the code does things differently and already
    handles this right.
    
    2020-03-25  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94300
            * tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): If pd.offset
            is positive, make sure that off + size isn't larger than needed_len.
    
            * gcc.target/i386/avx512f-pr94300.c: New test.
Comment 4 Jakub Jelinek 2020-03-25 08:22:31 UTC
Fixed.