Bug 108064 - [13 Regression] apache-arrow-cpp-9.0.0 is vectored incorrectly: arithmetic shift instead of logical
Summary: [13 Regression] apache-arrow-cpp-9.0.0 is vectored incorrectly: arithmetic sh...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P1 normal
Target Milestone: 13.0
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2022-12-11 19:38 UTC by Sergei Trofimovich
Modified: 2022-12-13 15:56 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-12-11 00:00:00


Attachments
gcc13-pr108064.patch (956 bytes, patch)
2022-12-12 12:25 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich 2022-12-11 19:38:20 UTC
Initially observed the failure as an array test failure in apache-arrow-cpp-9.0.0:

    [  FAILED  ] TestSwapEndianArrayData.RandomData

There array of int16_t gets endianness shifted element by element. Minimized example:

// $ cat a.cc
typedef short int i16;

static inline i16 ByteSwap16(i16 value) {
  constexpr auto m = static_cast<i16>(0xff);
  return static_cast<i16>(((value >> 8) & m) | ((value & m) << 8));
}

__attribute__((noipa))
void swab16(i16 * d, const i16* s) {
  for (unsigned long i = 0; i < 4; i++) {
    d[i] = ByteSwap16(s[i]);
  }
}

__attribute__((noipa))
int main(void) {
  /* need to alogn inputs to make sure vectized part
     of the loop gets executed. */
  alignas(16) i16 a[4] = {0xff, 0, 0, 0};
  alignas(16) i16 b[4];
  alignas(16) i16 c[4];

  swab16(b, a);
  swab16(c, b);

  /* Contents of 'a' should be equivalent to 'c'.
     But gcc bug generates invalid vectored shifts.  */
  if (a[0] != c[0])
    __builtin_trap();
}

Weekly gcc-13 (and master branch) generate invalid code for it:

    $ ./gcc-git/bin/g++ -O3 a.cc -o a && ./a
    Illegal instruction (core dumped)
    $ ./gcc-git/bin/g++ -O0 a.cc -o a && ./a

AFAIU swab16() gets miscompiled:

  Dump of assembler code for function _Z6swab16PsPKs:
   ...
    movq   (%rsi),%xmm0
    movdqa %xmm0,%xmm1
    psllw  $0x8,%xmm0
    psraw  $0x8,%xmm1 ; <<<- should be psrlw!
    por    %xmm1,%xmm0
    movq   %xmm0,(%rdi)

Here 'gcc' loads 64 bits at a time and swaps even and odd bytes
- 'psllw' moves odd bytes (zero-filling, ok)
- 'psraw' moves even bytes (sign-extending, bug)

As a result 'por' has a chance of masking even byte position with a sign bit.

$ ./gcc-git/bin/g++ -v |& unnix
Using built-in specs.
COLLECT_GCC=/<<NIX>>/gcc-13.0.0/bin/g++
COLLECT_LTO_WRAPPER=/<<NIX>>/gcc-13.0.0/libexec/gcc/x86_64-unknown-linux-gnu/13.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with:
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20221211 (experimental) (GCC)
Comment 1 Jakub Jelinek 2022-12-11 20:16:07 UTC
Started with r13-1100-gacb1e6f43dc2bbedd1248ea61c7ab537a11fe59b
I'll have a look.
Comment 2 Jakub Jelinek 2022-12-12 12:25:10 UTC
Created attachment 54070 [details]
gcc13-pr108064.patch

Untested fix.
Comment 3 GCC Commits 2022-12-13 15:55:59 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:33be3ee36a7e2c0be383ec01b5fbc9aef39568fd

commit r13-4679-g33be3ee36a7e2c0be383ec01b5fbc9aef39568fd
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Dec 13 16:55:21 2022 +0100

    vect-patterns: Fix up vect_recog_rotate_pattern [PR108064]
    
    Since vect_recog_rotate_pattern has been extended to work also
    on signed types in r13-1100 we miscompile the testcase below.
    vect_recog_rotate_pattern actually emits correct scalar code into
    the pattern def sequence (in particular cast to utype, doing the
    2 shifts in utype so that the right shift is logical and not arithmetic,
    or and then cast back to the signed type), but it didn't supply vectype
    for most of those pattern statements, which means that the generic handling
    fills it up later with the vectype provided by vect_recog_rotate_pattern.
    The problem is that it is vectype of the result of the whole pattern,
    i.e. vector of signed values in this case, while the conversion to utype,
    2 shifts and or (everything with utype lhs in scalar code) should have
    uvectype as STMT_VINFO_VECTYPE.
    
    2022-12-13  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/108064
            * tree-vect-patterns.cc (vect_recog_rotate_pattern): Pass uvectype
            as 4th argument to append_pattern_def_seq for statements with lhs
            with utype type.
    
            * gcc.c-torture/execute/pr108064.c: New test.
Comment 4 Jakub Jelinek 2022-12-13 15:56:25 UTC
Fixed.