Bug 96174 - AVX-512 functions missing when compiled without optimization
Summary: AVX-512 functions missing when compiled without optimization
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.1.1
: P3 normal
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-12 07:18 UTC by Evan Nemerson
Modified: 2020-09-16 19:22 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-07-12 00:00:00


Attachments
gcc11-pr96174.patch (1.21 KB, patch)
2020-07-12 21:29 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Nemerson 2020-07-12 07:18:56 UTC
The avx512fintrin.h header sometimes uses different implementations depending on whether __OPTIMIZE__ is defined, but many functions are missing if __OPTIMIZE__ is not defined.

Here is a trivial test case:

  #include <immintrin.h>

  __mmask16 foo(__m512 a, __m512 b) {
    return _mm512_cmplt_ps_mask(a, b);
  }

On Compiler Explorer: https://godbolt.org/z/83jP63

I ran into this with _mm512_cmplt_ps_mask, but it looks like this all the _mm512_cmp*_{pd,ps}_mask functions have the same problem.
Comment 1 Jakub Jelinek 2020-07-12 21:29:49 UTC
Created attachment 48865 [details]
gcc11-pr96174.patch

Untested fix.
Comment 2 GCC Commits 2020-07-15 09:38:03 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:12d69dbfff9dd5ad4a30b20d1636f5cab6425e8c

commit r11-2104-g12d69dbfff9dd5ad4a30b20d1636f5cab6425e8c
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Jul 15 11:34:44 2020 +0200

    fix _mm512_{,mask_}cmp*_p[ds]_mask at -O0 [PR96174]
    
    The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
    intrinsics have an argument which must have a constant passed to it
    and so use an inline version only for ifdef __OPTIMIZE__ and have
    a #define for -O0.  But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
    don't need a constant argument, they are essentially the first
    set with the constant added to them implicitly based on the comparison
    name, and so there is no #define version for them (correctly).
    But their inline versions are defined in between the first and s[ds]
    set and so inside of ifdef __OPTIMIZE__, which means that with -O0
    they aren't defined at all.
    
    This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
    use #define #endif block.
    
    2020-07-15  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/96174
            * config/i386/avx512fintrin.h (_mm512_cmpeq_pd_mask,
            _mm512_mask_cmpeq_pd_mask, _mm512_cmplt_pd_mask,
            _mm512_mask_cmplt_pd_mask, _mm512_cmple_pd_mask,
            _mm512_mask_cmple_pd_mask, _mm512_cmpunord_pd_mask,
            _mm512_mask_cmpunord_pd_mask, _mm512_cmpneq_pd_mask,
            _mm512_mask_cmpneq_pd_mask, _mm512_cmpnlt_pd_mask,
            _mm512_mask_cmpnlt_pd_mask, _mm512_cmpnle_pd_mask,
            _mm512_mask_cmpnle_pd_mask, _mm512_cmpord_pd_mask,
            _mm512_mask_cmpord_pd_mask, _mm512_cmpeq_ps_mask,
            _mm512_mask_cmpeq_ps_mask, _mm512_cmplt_ps_mask,
            _mm512_mask_cmplt_ps_mask, _mm512_cmple_ps_mask,
            _mm512_mask_cmple_ps_mask, _mm512_cmpunord_ps_mask,
            _mm512_mask_cmpunord_ps_mask, _mm512_cmpneq_ps_mask,
            _mm512_mask_cmpneq_ps_mask, _mm512_cmpnlt_ps_mask,
            _mm512_mask_cmpnlt_ps_mask, _mm512_cmpnle_ps_mask,
            _mm512_mask_cmpnle_ps_mask, _mm512_cmpord_ps_mask,
            _mm512_mask_cmpord_ps_mask): Move outside of __OPTIMIZE__ guarded
            section.
    
            * gcc.target/i386/avx512f-vcmppd-3.c: New test.
            * gcc.target/i386/avx512f-vcmpps-3.c: New test.
Comment 3 GCC Commits 2020-07-15 09:42:04 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:9a9e1ed88614b96944d2e5e92e932f65dcf2d920

commit r10-8500-g9a9e1ed88614b96944d2e5e92e932f65dcf2d920
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Jul 15 11:34:44 2020 +0200

    fix _mm512_{,mask_}cmp*_p[ds]_mask at -O0 [PR96174]
    
    The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
    intrinsics have an argument which must have a constant passed to it
    and so use an inline version only for ifdef __OPTIMIZE__ and have
    a #define for -O0.  But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
    don't need a constant argument, they are essentially the first
    set with the constant added to them implicitly based on the comparison
    name, and so there is no #define version for them (correctly).
    But their inline versions are defined in between the first and s[ds]
    set and so inside of ifdef __OPTIMIZE__, which means that with -O0
    they aren't defined at all.
    
    This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
    use #define #endif block.
    
    2020-07-15  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/96174
            * config/i386/avx512fintrin.h (_mm512_cmpeq_pd_mask,
            _mm512_mask_cmpeq_pd_mask, _mm512_cmplt_pd_mask,
            _mm512_mask_cmplt_pd_mask, _mm512_cmple_pd_mask,
            _mm512_mask_cmple_pd_mask, _mm512_cmpunord_pd_mask,
            _mm512_mask_cmpunord_pd_mask, _mm512_cmpneq_pd_mask,
            _mm512_mask_cmpneq_pd_mask, _mm512_cmpnlt_pd_mask,
            _mm512_mask_cmpnlt_pd_mask, _mm512_cmpnle_pd_mask,
            _mm512_mask_cmpnle_pd_mask, _mm512_cmpord_pd_mask,
            _mm512_mask_cmpord_pd_mask, _mm512_cmpeq_ps_mask,
            _mm512_mask_cmpeq_ps_mask, _mm512_cmplt_ps_mask,
            _mm512_mask_cmplt_ps_mask, _mm512_cmple_ps_mask,
            _mm512_mask_cmple_ps_mask, _mm512_cmpunord_ps_mask,
            _mm512_mask_cmpunord_ps_mask, _mm512_cmpneq_ps_mask,
            _mm512_mask_cmpneq_ps_mask, _mm512_cmpnlt_ps_mask,
            _mm512_mask_cmpnlt_ps_mask, _mm512_cmpnle_ps_mask,
            _mm512_mask_cmpnle_ps_mask, _mm512_cmpord_ps_mask,
            _mm512_mask_cmpord_ps_mask): Move outside of __OPTIMIZE__ guarded
            section.
    
            * gcc.target/i386/avx512f-vcmppd-3.c: New test.
            * gcc.target/i386/avx512f-vcmpps-3.c: New test.
    
    (cherry picked from commit 12d69dbfff9dd5ad4a30b20d1636f5cab6425e8c)
Comment 4 Jakub Jelinek 2020-07-15 09:59:08 UTC
I've checked it in as obvious to both trunk and 10.2.
Comment 5 GCC Commits 2020-09-16 19:22:38 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:fdcb6dae610aba75e23c1fd2d31b491691e54091

commit r9-8904-gfdcb6dae610aba75e23c1fd2d31b491691e54091
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Jul 15 11:34:44 2020 +0200

    fix _mm512_{,mask_}cmp*_p[ds]_mask at -O0 [PR96174]
    
    The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
    intrinsics have an argument which must have a constant passed to it
    and so use an inline version only for ifdef __OPTIMIZE__ and have
    a #define for -O0.  But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
    don't need a constant argument, they are essentially the first
    set with the constant added to them implicitly based on the comparison
    name, and so there is no #define version for them (correctly).
    But their inline versions are defined in between the first and s[ds]
    set and so inside of ifdef __OPTIMIZE__, which means that with -O0
    they aren't defined at all.
    
    This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
    use #define #endif block.
    
    2020-07-15  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/96174
            * config/i386/avx512fintrin.h (_mm512_cmpeq_pd_mask,
            _mm512_mask_cmpeq_pd_mask, _mm512_cmplt_pd_mask,
            _mm512_mask_cmplt_pd_mask, _mm512_cmple_pd_mask,
            _mm512_mask_cmple_pd_mask, _mm512_cmpunord_pd_mask,
            _mm512_mask_cmpunord_pd_mask, _mm512_cmpneq_pd_mask,
            _mm512_mask_cmpneq_pd_mask, _mm512_cmpnlt_pd_mask,
            _mm512_mask_cmpnlt_pd_mask, _mm512_cmpnle_pd_mask,
            _mm512_mask_cmpnle_pd_mask, _mm512_cmpord_pd_mask,
            _mm512_mask_cmpord_pd_mask, _mm512_cmpeq_ps_mask,
            _mm512_mask_cmpeq_ps_mask, _mm512_cmplt_ps_mask,
            _mm512_mask_cmplt_ps_mask, _mm512_cmple_ps_mask,
            _mm512_mask_cmple_ps_mask, _mm512_cmpunord_ps_mask,
            _mm512_mask_cmpunord_ps_mask, _mm512_cmpneq_ps_mask,
            _mm512_mask_cmpneq_ps_mask, _mm512_cmpnlt_ps_mask,
            _mm512_mask_cmpnlt_ps_mask, _mm512_cmpnle_ps_mask,
            _mm512_mask_cmpnle_ps_mask, _mm512_cmpord_ps_mask,
            _mm512_mask_cmpord_ps_mask): Move outside of __OPTIMIZE__ guarded
            section.
    
            * gcc.target/i386/avx512f-vcmppd-3.c: New test.
            * gcc.target/i386/avx512f-vcmpps-3.c: New test.
    
    (cherry picked from commit 12d69dbfff9dd5ad4a30b20d1636f5cab6425e8c)