Bug 94832

Summary: AVX512 scatter/gather macros lack parentheses when unoptimized
Product: gcc Reporter: Kenneth Heafield <gcc>
Component: targetAssignee: Jakub Jelinek <jakub>
Status: RESOLVED FIXED    
Severity: normal CC: jakub
Priority: P3 Keywords: wrong-code
Version: 9.3.0   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80885
Host: Target: x86_64-*-* i?86-*-*
Build: Known to work:
Known to fail: Last reconfirmed: 2020-04-28 00:00:00
Attachments: gcc10-pr94832.patch

Description Kenneth Heafield 2020-04-28 19:50:21 UTC
This code behaves differently and produces a warning about void * arithmetic when compiled without optimization:

#include <immintrin.h>
void Fail(int *data) {
  _mm512_mask_i32scatter_epi32(data - 1, 0xffff, _mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
}

Warning and writes are based at (void*)data - 1:

g++ -mavx512bw example.cc -c -o example.o
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include/immintrin.h:55,
                 from example.cc:1:
example.cc: In function ‘void Foo(int*)’:
example.cc:4:37: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]
    4 |   _mm512_mask_i32scatter_epi32(data - 1, 0xffff, _mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
      |                                     ^

No warning and writes are based at (void*)(data - 1), the expected behavior:

g++ -mavx512bw example.cc -O3 -c -o example.o
# No output.

If we look at avx512fintrin.h, it becomes clear why:

#ifdef __OPTIMIZE__
/* ... */
extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_mask_i32scatter_epi32 (void *__addr, __mmask16 __mask,
            __m512i __index, __m512i __v1, int __scale)
{
  __builtin_ia32_scattersiv16si (__addr, __mask, (__v16si) __index,
         (__v16si) __v1, __scale);
}
/* ... */
#else
/* ... */
#define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE)  \
  __builtin_ia32_scattersiv16si ((void *)ADDR, (__mmask16)MASK,   \
         (__v16si)(__m512i)INDEX,   \
         (__v16si)(__m512i)V1, (int)SCALE)
/* ... */
#endif

When compiled without optimization, the header uses a macro.  And data - 1 is mapping to (void*)data - 1, producing a warning about type ‘void *’ used in arithmetic as well as a different address calculation.  

Tested on two gcc versions.  

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-9.3.0/work/gcc-9.3.0/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/9.3.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.3.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.3.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.3.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include/g++-v9 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/9.3.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 9.3.0 p2' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libmudflap --disable-libssp --disable-libada --disable-systemtap --enable-vtable-verify --enable-lto --without-isl --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 9.3.0 (Gentoo 9.3.0 p2) 

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 8.4.0-1ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04)
Comment 1 Jakub Jelinek 2020-04-28 21:33:28 UTC
I'll handle this and look at what other macros are affected.
Comment 2 Jakub Jelinek 2020-04-29 09:45:50 UTC
Created attachment 48405 [details]
gcc10-pr94832.patch

Untested fix for the -O0 gather/scatter macros.
Comment 3 Kenneth Heafield 2020-04-29 09:56:08 UTC
Being a macro some of the time also causes trouble with template commas and the C preprocessor.  

#include <immintrin.h>
template <class S, class T> int *TemplatedFunction();
void Fail() {
  _mm512_mask_i32scatter_epi32(TemplatedFunction<void, void>(), 0xffff, _mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
}

Without optimization, error because the template , is interpreted by the macro.  

g++ -mavx512f -c template.cc 
template.cc:6:118: error: macro "_mm512_mask_i32scatter_epi32" passed 6 arguments, but takes just 5
    6 |   _mm512_mask_i32scatter_epi32(TemplatedFunction<void, void>(), 0xffff, _mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
      |                                                                                                                      ^
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include/immintrin.h:55,
                 from template.cc:1:
/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include/avx512fintrin.h:10475: note: macro "_mm512_mask_i32scatter_epi32" defined here
10475 | #define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \
      | 
template.cc: In function ‘void Fail()’:
template.cc:6:3: error: ‘_mm512_mask_i32scatter_epi32’ was not declared in this scope
    6 |   _mm512_mask_i32scatter_epi32(TemplatedFunction<void, void>(), 0xffff, _mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~


With optimization, no output.  
g++ -mavx512f -O3 -c template.cc
Comment 4 Jakub Jelinek 2020-04-29 10:01:42 UTC
(In reply to Kenneth Heafield from comment #3)
> Being a macro some of the time also causes trouble with template commas and
> the C preprocessor.  
> 
> #include <immintrin.h>
> template <class S, class T> int *TemplatedFunction();
> void Fail() {
>   _mm512_mask_i32scatter_epi32(TemplatedFunction<void, void>(), 0xffff,
> _mm512_set1_epi32(1), _mm512_set1_epi32(1), 1);
> }

You need to wrap the arguments in ()s then, I'm afraid there is nothing else that can be done about that.  The reason for the macros rather than inline functions is that those particular intrinsic require at least one compile time constant argument and at -O0 there is no guarantee the compile time constant would be propagated into the builtin that is used under the hood for the intrinsic.
It is the same thing as with say C header APIs, those can be also implemented as functions or as macros.
Comment 5 GCC Commits 2020-04-29 15:32:13 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:78cef09019cc9c80d1b39a49861f8827a2ee2e60

commit r10-8054-g78cef09019cc9c80d1b39a49861f8827a2ee2e60
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Apr 29 17:30:22 2020 +0200

    x86: Fix -O0 intrinsic *gather*/*scatter* macros [PR94832]
    
    As reported in the PR, while most intrinsic -O0 macro argument uses
    are properly wrapped in ()s or used in context where having a complex
    expression passed as the argument doesn't pose a problem (e.g. when
    macro argument use is in between commas, or between ( and comma, or
    between comma and ) etc.), especially the gather/scatter macros don't do
    this and if one passes to some macro e.g. x + y as argument, the
    corresponding inline function would do cast on the argument, but
    the macro does (int) ARG, then it is (int) x + y rather than (int) (x + y).
    
    The following patch fixes those issues in *gather/*scatter*; additionally,
    the AVX2 macros were passing incorrect mask of e.g.
    (__v2df)_mm_set1_pd((double)(long long int) -1)
    which is IMHO equivalent to
    (__v2df){-1.0, -1.0}
    when it really wants to pass __v2df vector with all bits set.
    I've used what the inline functions use for those cases.
    
    2020-04-29  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/94832
            * config/i386/avx2intrin.h (_mm_mask_i32gather_pd,
            _mm256_mask_i32gather_pd, _mm_mask_i64gather_pd,
            _mm256_mask_i64gather_pd, _mm_mask_i32gather_ps,
            _mm256_mask_i32gather_ps, _mm_mask_i64gather_ps,
            _mm256_mask_i64gather_ps, _mm_i32gather_epi64,
            _mm_mask_i32gather_epi64, _mm256_i32gather_epi64,
            _mm256_mask_i32gather_epi64, _mm_i64gather_epi64,
            _mm_mask_i64gather_epi64, _mm256_i64gather_epi64,
            _mm256_mask_i64gather_epi64, _mm_i32gather_epi32,
            _mm_mask_i32gather_epi32, _mm256_i32gather_epi32,
            _mm256_mask_i32gather_epi32, _mm_i64gather_epi32,
            _mm_mask_i64gather_epi32, _mm256_i64gather_epi32,
            _mm256_mask_i64gather_epi32): Surround macro parameter uses with
            parens.
            (_mm_i32gather_pd, _mm256_i32gather_pd, _mm_i64gather_pd,
            _mm256_i64gather_pd, _mm_i32gather_ps, _mm256_i32gather_ps,
            _mm_i64gather_ps, _mm256_i64gather_ps): Likewise.  Don't use
            as mask vector containing -1.0 or -1.0f elts, but instead vector
            with all bits set using _mm*_cmpeq_p? with zero operands.
            * config/i386/avx512fintrin.h (_mm512_i32gather_ps,
            _mm512_mask_i32gather_ps, _mm512_i32gather_pd,
            _mm512_mask_i32gather_pd, _mm512_i64gather_ps,
            _mm512_mask_i64gather_ps, _mm512_i64gather_pd,
            _mm512_mask_i64gather_pd, _mm512_i32gather_epi32,
            _mm512_mask_i32gather_epi32, _mm512_i32gather_epi64,
            _mm512_mask_i32gather_epi64, _mm512_i64gather_epi32,
            _mm512_mask_i64gather_epi32, _mm512_i64gather_epi64,
            _mm512_mask_i64gather_epi64, _mm512_i32scatter_ps,
            _mm512_mask_i32scatter_ps, _mm512_i32scatter_pd,
            _mm512_mask_i32scatter_pd, _mm512_i64scatter_ps,
            _mm512_mask_i64scatter_ps, _mm512_i64scatter_pd,
            _mm512_mask_i64scatter_pd, _mm512_i32scatter_epi32,
            _mm512_mask_i32scatter_epi32, _mm512_i32scatter_epi64,
            _mm512_mask_i32scatter_epi64, _mm512_i64scatter_epi32,
            _mm512_mask_i64scatter_epi32, _mm512_i64scatter_epi64,
            _mm512_mask_i64scatter_epi64): Surround macro parameter uses with
            parens.
            * config/i386/avx512pfintrin.h (_mm512_prefetch_i32gather_pd,
            _mm512_prefetch_i32gather_ps, _mm512_mask_prefetch_i32gather_pd,
            _mm512_mask_prefetch_i32gather_ps, _mm512_prefetch_i64gather_pd,
            _mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_pd,
            _mm512_mask_prefetch_i64gather_ps, _mm512_prefetch_i32scatter_pd,
            _mm512_prefetch_i32scatter_ps, _mm512_mask_prefetch_i32scatter_pd,
            _mm512_mask_prefetch_i32scatter_ps, _mm512_prefetch_i64scatter_pd,
            _mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_pd,
            _mm512_mask_prefetch_i64scatter_ps): Likewise.
            * config/i386/avx512vlintrin.h (_mm256_mmask_i32gather_ps,
            _mm_mmask_i32gather_ps, _mm256_mmask_i32gather_pd,
            _mm_mmask_i32gather_pd, _mm256_mmask_i64gather_ps,
            _mm_mmask_i64gather_ps, _mm256_mmask_i64gather_pd,
            _mm_mmask_i64gather_pd, _mm256_mmask_i32gather_epi32,
            _mm_mmask_i32gather_epi32, _mm256_mmask_i32gather_epi64,
            _mm_mmask_i32gather_epi64, _mm256_mmask_i64gather_epi32,
            _mm_mmask_i64gather_epi32, _mm256_mmask_i64gather_epi64,
            _mm_mmask_i64gather_epi64, _mm256_i32scatter_ps,
            _mm256_mask_i32scatter_ps, _mm_i32scatter_ps, _mm_mask_i32scatter_ps,
            _mm256_i32scatter_pd, _mm256_mask_i32scatter_pd, _mm_i32scatter_pd,
            _mm_mask_i32scatter_pd, _mm256_i64scatter_ps,
            _mm256_mask_i64scatter_ps, _mm_i64scatter_ps, _mm_mask_i64scatter_ps,
            _mm256_i64scatter_pd, _mm256_mask_i64scatter_pd, _mm_i64scatter_pd,
            _mm_mask_i64scatter_pd, _mm256_i32scatter_epi32,
            _mm256_mask_i32scatter_epi32, _mm_i32scatter_epi32,
            _mm_mask_i32scatter_epi32, _mm256_i32scatter_epi64,
            _mm256_mask_i32scatter_epi64, _mm_i32scatter_epi64,
            _mm_mask_i32scatter_epi64, _mm256_i64scatter_epi32,
            _mm256_mask_i64scatter_epi32, _mm_i64scatter_epi32,
            _mm_mask_i64scatter_epi32, _mm256_i64scatter_epi64,
            _mm256_mask_i64scatter_epi64, _mm_i64scatter_epi64,
            _mm_mask_i64scatter_epi64): Likewise.
Comment 6 GCC Commits 2020-04-29 15:32:18 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:0c8217b16f307c3eedce8f22354714938613f701

commit r10-8055-g0c8217b16f307c3eedce8f22354714938613f701
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Apr 29 17:31:26 2020 +0200

    x86: Fix -O0 remaining intrinsic macros [PR94832]
    
    A few other macros seem to suffer from the same issue.  What I've done was:
    cat gcc/config/i386/*intrin.h | sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' \
    | grep '^[[:blank:]]*#[[:blank:]]*define[[:blank:]].*(' | sed 's/[      ]\+/ /g' \
    > /tmp/macros
    and then looking for regexps:
    )[a-zA-Z]
    ) [a-zA-Z]
    [a-zA-Z][-+*/%]
    [a-zA-Z] [-+*/%]
    [-+*/%][a-zA-Z]
    [-+*/%] [a-zA-Z]
    in the resulting file.
    
    2020-04-29  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/94832
            * config/i386/avx512bwintrin.h (_mm512_alignr_epi8,
            _mm512_mask_alignr_epi8, _mm512_maskz_alignr_epi8): Wrap macro operands
            used in casts into parens.
            * config/i386/avx512fintrin.h (_mm512_cvt_roundps_ph, _mm512_cvtps_ph,
            _mm512_mask_cvt_roundps_ph, _mm512_mask_cvtps_ph,
            _mm512_maskz_cvt_roundps_ph, _mm512_maskz_cvtps_ph,
            _mm512_mask_cmp_epi64_mask, _mm512_mask_cmp_epi32_mask,
            _mm512_mask_cmp_epu64_mask, _mm512_mask_cmp_epu32_mask,
            _mm512_mask_cmp_round_pd_mask, _mm512_mask_cmp_round_ps_mask,
            _mm512_mask_cmp_pd_mask, _mm512_mask_cmp_ps_mask): Likewise.
            * config/i386/avx512vlbwintrin.h (_mm256_mask_alignr_epi8,
            _mm256_maskz_alignr_epi8, _mm_mask_alignr_epi8, _mm_maskz_alignr_epi8,
            _mm256_mask_cmp_epu8_mask): Likewise.
            * config/i386/avx512vlintrin.h (_mm_mask_cvtps_ph, _mm_maskz_cvtps_ph,
            _mm256_mask_cvtps_ph, _mm256_maskz_cvtps_ph): Likewise.
            * config/i386/f16cintrin.h (_mm_cvtps_ph, _mm256_cvtps_ph): Likewise.
            * config/i386/shaintrin.h (_mm_sha1rnds4_epu32): Likewise.
Comment 7 Jakub Jelinek 2020-04-29 15:34:01 UTC
Fixed for 10+ so far.
Comment 8 GCC Commits 2020-09-16 19:21:30 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f97bf9657cecaaf8afd14b43e5ca9be294ab870c

commit r9-8891-gf97bf9657cecaaf8afd14b43e5ca9be294ab870c
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Apr 29 17:30:22 2020 +0200

    x86: Fix -O0 intrinsic *gather*/*scatter* macros [PR94832]
    
    As reported in the PR, while most intrinsic -O0 macro argument uses
    are properly wrapped in ()s or used in context where having a complex
    expression passed as the argument doesn't pose a problem (e.g. when
    macro argument use is in between commas, or between ( and comma, or
    between comma and ) etc.), especially the gather/scatter macros don't do
    this and if one passes to some macro e.g. x + y as argument, the
    corresponding inline function would do cast on the argument, but
    the macro does (int) ARG, then it is (int) x + y rather than (int) (x + y).
    
    The following patch fixes those issues in *gather/*scatter*; additionally,
    the AVX2 macros were passing incorrect mask of e.g.
    (__v2df)_mm_set1_pd((double)(long long int) -1)
    which is IMHO equivalent to
    (__v2df){-1.0, -1.0}
    when it really wants to pass __v2df vector with all bits set.
    I've used what the inline functions use for those cases.
    
    2020-04-29  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/94832
            * config/i386/avx2intrin.h (_mm_mask_i32gather_pd,
            _mm256_mask_i32gather_pd, _mm_mask_i64gather_pd,
            _mm256_mask_i64gather_pd, _mm_mask_i32gather_ps,
            _mm256_mask_i32gather_ps, _mm_mask_i64gather_ps,
            _mm256_mask_i64gather_ps, _mm_i32gather_epi64,
            _mm_mask_i32gather_epi64, _mm256_i32gather_epi64,
            _mm256_mask_i32gather_epi64, _mm_i64gather_epi64,
            _mm_mask_i64gather_epi64, _mm256_i64gather_epi64,
            _mm256_mask_i64gather_epi64, _mm_i32gather_epi32,
            _mm_mask_i32gather_epi32, _mm256_i32gather_epi32,
            _mm256_mask_i32gather_epi32, _mm_i64gather_epi32,
            _mm_mask_i64gather_epi32, _mm256_i64gather_epi32,
            _mm256_mask_i64gather_epi32): Surround macro parameter uses with
            parens.
            (_mm_i32gather_pd, _mm256_i32gather_pd, _mm_i64gather_pd,
            _mm256_i64gather_pd, _mm_i32gather_ps, _mm256_i32gather_ps,
            _mm_i64gather_ps, _mm256_i64gather_ps): Likewise.  Don't use
            as mask vector containing -1.0 or -1.0f elts, but instead vector
            with all bits set using _mm*_cmpeq_p? with zero operands.
            * config/i386/avx512fintrin.h (_mm512_i32gather_ps,
            _mm512_mask_i32gather_ps, _mm512_i32gather_pd,
            _mm512_mask_i32gather_pd, _mm512_i64gather_ps,
            _mm512_mask_i64gather_ps, _mm512_i64gather_pd,
            _mm512_mask_i64gather_pd, _mm512_i32gather_epi32,
            _mm512_mask_i32gather_epi32, _mm512_i32gather_epi64,
            _mm512_mask_i32gather_epi64, _mm512_i64gather_epi32,
            _mm512_mask_i64gather_epi32, _mm512_i64gather_epi64,
            _mm512_mask_i64gather_epi64, _mm512_i32scatter_ps,
            _mm512_mask_i32scatter_ps, _mm512_i32scatter_pd,
            _mm512_mask_i32scatter_pd, _mm512_i64scatter_ps,
            _mm512_mask_i64scatter_ps, _mm512_i64scatter_pd,
            _mm512_mask_i64scatter_pd, _mm512_i32scatter_epi32,
            _mm512_mask_i32scatter_epi32, _mm512_i32scatter_epi64,
            _mm512_mask_i32scatter_epi64, _mm512_i64scatter_epi32,
            _mm512_mask_i64scatter_epi32, _mm512_i64scatter_epi64,
            _mm512_mask_i64scatter_epi64): Surround macro parameter uses with
            parens.
            * config/i386/avx512pfintrin.h (_mm512_prefetch_i32gather_pd,
            _mm512_prefetch_i32gather_ps, _mm512_mask_prefetch_i32gather_pd,
            _mm512_mask_prefetch_i32gather_ps, _mm512_prefetch_i64gather_pd,
            _mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_pd,
            _mm512_mask_prefetch_i64gather_ps, _mm512_prefetch_i32scatter_pd,
            _mm512_prefetch_i32scatter_ps, _mm512_mask_prefetch_i32scatter_pd,
            _mm512_mask_prefetch_i32scatter_ps, _mm512_prefetch_i64scatter_pd,
            _mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_pd,
            _mm512_mask_prefetch_i64scatter_ps): Likewise.
            * config/i386/avx512vlintrin.h (_mm256_mmask_i32gather_ps,
            _mm_mmask_i32gather_ps, _mm256_mmask_i32gather_pd,
            _mm_mmask_i32gather_pd, _mm256_mmask_i64gather_ps,
            _mm_mmask_i64gather_ps, _mm256_mmask_i64gather_pd,
            _mm_mmask_i64gather_pd, _mm256_mmask_i32gather_epi32,
            _mm_mmask_i32gather_epi32, _mm256_mmask_i32gather_epi64,
            _mm_mmask_i32gather_epi64, _mm256_mmask_i64gather_epi32,
            _mm_mmask_i64gather_epi32, _mm256_mmask_i64gather_epi64,
            _mm_mmask_i64gather_epi64, _mm256_i32scatter_ps,
            _mm256_mask_i32scatter_ps, _mm_i32scatter_ps, _mm_mask_i32scatter_ps,
            _mm256_i32scatter_pd, _mm256_mask_i32scatter_pd, _mm_i32scatter_pd,
            _mm_mask_i32scatter_pd, _mm256_i64scatter_ps,
            _mm256_mask_i64scatter_ps, _mm_i64scatter_ps, _mm_mask_i64scatter_ps,
            _mm256_i64scatter_pd, _mm256_mask_i64scatter_pd, _mm_i64scatter_pd,
            _mm_mask_i64scatter_pd, _mm256_i32scatter_epi32,
            _mm256_mask_i32scatter_epi32, _mm_i32scatter_epi32,
            _mm_mask_i32scatter_epi32, _mm256_i32scatter_epi64,
            _mm256_mask_i32scatter_epi64, _mm_i32scatter_epi64,
            _mm_mask_i32scatter_epi64, _mm256_i64scatter_epi32,
            _mm256_mask_i64scatter_epi32, _mm_i64scatter_epi32,
            _mm_mask_i64scatter_epi32, _mm256_i64scatter_epi64,
            _mm256_mask_i64scatter_epi64, _mm_i64scatter_epi64,
            _mm_mask_i64scatter_epi64): Likewise.
    
    (cherry picked from commit 78cef09019cc9c80d1b39a49861f8827a2ee2e60)
Comment 9 GCC Commits 2020-09-16 19:21:35 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:ccee0511abf6e0bb679fa6b4941e5a71a6521b12

commit r9-8892-gccee0511abf6e0bb679fa6b4941e5a71a6521b12
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Apr 29 17:31:26 2020 +0200

    x86: Fix -O0 remaining intrinsic macros [PR94832]
    
    A few other macros seem to suffer from the same issue.  What I've done was:
    cat gcc/config/i386/*intrin.h | sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' \
    | grep '^[[:blank:]]*#[[:blank:]]*define[[:blank:]].*(' | sed 's/[      ]\+/ /g' \
    > /tmp/macros
    and then looking for regexps:
    )[a-zA-Z]
    ) [a-zA-Z]
    [a-zA-Z][-+*/%]
    [a-zA-Z] [-+*/%]
    [-+*/%][a-zA-Z]
    [-+*/%] [a-zA-Z]
    in the resulting file.
    
    2020-04-29  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/94832
            * config/i386/avx512bwintrin.h (_mm512_alignr_epi8,
            _mm512_mask_alignr_epi8, _mm512_maskz_alignr_epi8): Wrap macro operands
            used in casts into parens.
            * config/i386/avx512fintrin.h (_mm512_cvt_roundps_ph, _mm512_cvtps_ph,
            _mm512_mask_cvt_roundps_ph, _mm512_mask_cvtps_ph,
            _mm512_maskz_cvt_roundps_ph, _mm512_maskz_cvtps_ph,
            _mm512_mask_cmp_epi64_mask, _mm512_mask_cmp_epi32_mask,
            _mm512_mask_cmp_epu64_mask, _mm512_mask_cmp_epu32_mask,
            _mm512_mask_cmp_round_pd_mask, _mm512_mask_cmp_round_ps_mask,
            _mm512_mask_cmp_pd_mask, _mm512_mask_cmp_ps_mask): Likewise.
            * config/i386/avx512vlbwintrin.h (_mm256_mask_alignr_epi8,
            _mm256_maskz_alignr_epi8, _mm_mask_alignr_epi8, _mm_maskz_alignr_epi8,
            _mm256_mask_cmp_epu8_mask): Likewise.
            * config/i386/avx512vlintrin.h (_mm_mask_cvtps_ph, _mm_maskz_cvtps_ph,
            _mm256_mask_cvtps_ph, _mm256_maskz_cvtps_ph): Likewise.
            * config/i386/f16cintrin.h (_mm_cvtps_ph, _mm256_cvtps_ph): Likewise.
            * config/i386/shaintrin.h (_mm_sha1rnds4_epu32): Likewise.
    
    (cherry picked from commit 0c8217b16f307c3eedce8f22354714938613f701)
Comment 10 Jakub Jelinek 2020-09-17 17:41:09 UTC
Fixed for 8.5 in r8-10498-ga0159c30c19a1271f6b6ba6bc489c2c1c59954a3 and by the above commit for 9.4+ too.