102080 – [12 Regression] avx512vl related ICE, on firefox-92 gcc ICEs: in expand_insn, at optabs.c:7946 by r12-2679

Bug 102080 - [12 Regression] avx512vl related ICE, on firefox-92 gcc ICEs: in expand_insn, at optabs.c:7946 by r12-2679

Summary: [12 Regression] avx512vl related ICE, on firefox-92 gcc ICEs: in expand_insn,...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	12.0

Importance:	P3 normal
Target Milestone:	12.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	ice-on-valid-code

Depends on:
Blocks:

Reported:	2021-08-26 14:42 UTC by Sergei Trofimovich
Modified:	2021-12-17 16:07 UTC (History)
CC List:	9 users (show)

See Also:
Host:
Target:	x86_64--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-08-26 00:00:00

Attachments
dec_reconstruct.cc.cc.orig.xz (355.47 KB, application/x-xz) 2021-08-26 14:46 UTC, Sergei Trofimovich	Details
Proposed patch (1.55 KB, patch) 2021-08-27 05:17 UTC, Hongtao.liu	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sergei Trofimovich 2021-08-26 14:42:28 UTC

Observed gcc-12 ICE on today's gcc when was building firefox checkout.

I might have reduced it down to something invalid. Here is the minimal reproducer:

    // cat dec_reconstruct.cc.cc
    #pragma GCC target "avx"
    typedef float __m256 __attribute__((__vector_size__(32)));
    __m256 _mm256_blendv_ps___Y, _mm256_blendv_ps___M, _mm256_mul_ps___A,
        _mm256_mul_ps___B, IfThenElse___trans_tmp_9;
    void IfThenElse(__m256 no) {
      IfThenElse___trans_tmp_9 = __builtin_ia32_blendvps256(
          no, _mm256_blendv_ps___Y, _mm256_blendv_ps___M);
    }
    #pragma GCC target "avx512vl"
    void EncodedFromDisplay() {
      __m256 __trans_tmp_11 = _mm256_mul_ps___A * _mm256_mul_ps___B;
      IfThenElse(__trans_tmp_11);
    }

-O0 works, -O2 fails:

$ /tmp/gcc-c/gcc/xg++ -B/tmp/gcc-c/gcc -c dec_reconstruct.cc.cc -O0

$ /tmp/gcc-c/gcc/xg++ -B/tmp/gcc-c/gcc -c dec_reconstruct.cc.cc -O2
during RTL pass: expand
dec_reconstruct.cc.cc: In function 'void EncodedFromDisplay()':
dec_reconstruct.cc.cc:10:6: internal compiler error: in expand_insn, at optabs.c:7946
   10 | void EncodedFromDisplay() {
      |      ^~~~~~~~~~~~~~~~~~
0x7f3941d62286 __libc_start_call_main
        ../sysdeps/nptl/libc_start_call_main.h:58
0x7f3941d62337 __libc_start_main_impl
        ../csu/libc-start.c:409
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ /tmp/gcc-c/gcc/xg++ -B/tmp/gcc-c/gcc -v
Reading specs from /tmp/gcc-c/gcc/specs
COLLECT_GCC=/tmp/gcc-c/gcc/xg++
COLLECT_LTO_WRAPPER=/tmp/gcc-c/gcc/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/slyfox/dev/git/gcc/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-languages=c,c++ --disable-bootstrap --with-multilib-list=m64 --prefix=/tmp/gcc-c/../gcc-native-quick-installed --disable-nls --without-isl --disable-libsanitizer --disable-libvtv --disable-libgomp --disable-libstdcxx-pch --disable-libunwind-exceptions CFLAGS='-O1 ' CXXFLAGS='-O1 '
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20210826 (experimental) (GCC)

Comment 1 Sergei Trofimovich 2021-08-26 14:46:55 UTC

Created attachment 51362 [details]
dec_reconstruct.cc.cc.orig.xz

dec_reconstruct.cc.cc.orig.xz is the original extracted preprocessed file, 5MB.

Fails the same as:

$ /tmp/gcc-c/gcc/xg++ -B/tmp/gcc-c/gcc -c -x c++ /tmp/dec_reconstruct.cc.cc.orig -O2

during RTL pass: expand
In file included from /home/slyfox/dev/hg/firefox-source/obj-x86_64-pc-linux-gnu/dist/include/hwy/foreach_target.h:102,
                 from /home/slyfox/dev/hg/firefox-source/third_party/jpeg-xl/lib/jxl/dec_reconstruct.cc:16:
/home/slyfox/dev/hg/firefox-source/third_party/jpeg-xl/lib/jxl/dec_reconstruct.cc: In function 'jxl::Status jxl::N_AVX3::UndoXYBInPlace(jxl::Image3F*, const jxl::Rect&, const jxl::OutputEncodingInfo&)':
/home/slyfox/dev/hg/firefox-source/third_party/jpeg-xl/lib/jxl/dec_reconstruct.cc:39:8: internal compiler error: in expand_insn, at optabs.c:7946
   39 | Status UndoXYBInPlace(Image3F* idct, const Rect& rect,
      |        ^~~~~~~~~~~~~~
0x7f6eea182286 __libc_start_call_main
        ../sysdeps/nptl/libc_start_call_main.h:58
0x7f6eea182337 __libc_start_main_impl
        ../csu/libc-start.c:409
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Comment 2 H.J. Lu 2021-08-26 16:10:18 UTC

It is caused by r12-2679.

Comment 3 Hongtao.liu 2021-08-27 00:38:27 UTC

(In reply to H.J. Lu from comment #2)
> It is caused by r12-2679.

Mine.

Comment 4 Hongtao.liu 2021-08-27 02:16:29 UTC

diff --git a/test.c.032t.ccp1 b/test.c.033t.forwprop1
index 5b18739..c6f0587 100644
--- a/test.c.032t.ccp1
+++ b/test.c.033t.forwprop1
@@ -31,11 +31,12 @@ void EncodedFromDisplay ()
   __m256 __trans_tmp_11;
   vector(8) float _mm256_mul_ps___A.2_1;
   vector(8) float _mm256_mul_ps___B.3_2;
+  vector(8) <signed-boolean:32> _5;
   vector(8) float _mm256_blendv_ps___M.0_6;
   vector(8) float _mm256_blendv_ps___Y.1_7;
   vector(8) int _8;
   vector(8) <signed-boolean:32> _9;
-  vector(8) float _10;
+  vector(8) float _12;
 
   <bb 2> :
   _mm256_mul_ps___A.2_1 = _mm256_mul_ps___A;
@@ -45,8 +46,9 @@ void EncodedFromDisplay ()
   _mm256_blendv_ps___Y.1_7 = _mm256_blendv_ps___Y;
   _8 = VIEW_CONVERT_EXPR<vector(8) int>(_mm256_blendv_ps___M.0_6);
   _9 = _8 < { 0, 0, 0, 0, 0, 0, 0, 0 };
-  _10 = VEC_COND_EXPR <_9, _mm256_blendv_ps___Y.1_7, __trans_tmp_11_4>;
-  IfThenElse___trans_tmp_9 = _10;
+  _5 = _8 >= { 0, 0, 0, 0, 0, 0, 0, 0 };
+  _12 = .COND_MUL (_5, _mm256_mul_ps___A.2_1, _mm256_mul_ps___B.3_2, _mm256_blendv_ps___Y.1_7);
+  IfThenElse___trans_tmp_9 = _12;
   return;

fwprop1 should check if the type of _5 satisfies the predicate of cond_mul.

Comment 5 Andrew Pinski 2021-08-27 02:19:14 UTC

(In reply to Hongtao.liu from comment #4)
> diff --git a/test.c.032t.ccp1 b/test.c.033t.forwprop1
> index 5b18739..c6f0587 100644
> --- a/test.c.032t.ccp1
> +++ b/test.c.033t.forwprop1
> @@ -31,11 +31,12 @@ void EncodedFromDisplay ()
>    __m256 __trans_tmp_11;
>    vector(8) float _mm256_mul_ps___A.2_1;
>    vector(8) float _mm256_mul_ps___B.3_2;
> +  vector(8) <signed-boolean:32> _5;
>    vector(8) float _mm256_blendv_ps___M.0_6;
>    vector(8) float _mm256_blendv_ps___Y.1_7;
>    vector(8) int _8;
>    vector(8) <signed-boolean:32> _9;
> -  vector(8) float _10;
> +  vector(8) float _12;
>  
>    <bb 2> :
>    _mm256_mul_ps___A.2_1 = _mm256_mul_ps___A;
> @@ -45,8 +46,9 @@ void EncodedFromDisplay ()
>    _mm256_blendv_ps___Y.1_7 = _mm256_blendv_ps___Y;
>    _8 = VIEW_CONVERT_EXPR<vector(8) int>(_mm256_blendv_ps___M.0_6);
>    _9 = _8 < { 0, 0, 0, 0, 0, 0, 0, 0 };
> -  _10 = VEC_COND_EXPR <_9, _mm256_blendv_ps___Y.1_7, __trans_tmp_11_4>;
> -  IfThenElse___trans_tmp_9 = _10;
> +  _5 = _8 >= { 0, 0, 0, 0, 0, 0, 0, 0 };
> +  _12 = .COND_MUL (_5, _mm256_mul_ps___A.2_1, _mm256_mul_ps___B.3_2,
> _mm256_blendv_ps___Y.1_7);
> +  IfThenElse___trans_tmp_9 = _12;
>    return;
> 
> fwprop1 should check if the type of _5 satisfies the predicate of cond_mul.

Actually this seems backwards ... what match pattern is doing this?

Comment 6 Andrew Pinski 2021-08-27 02:23:54 UTC

This is the match pattern which does it:
(for uncond_op (UNCOND_BINARY)
     cond_op (COND_BINARY)
 (simplify
  (vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
  (with { tree op_type = TREE_TYPE (@4); }
   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
        && element_precision (type) == element_precision (op_type))
    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))))))
 (simplify
  (vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
  (with { tree op_type = TREE_TYPE (@4); }
   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
        && element_precision (type) == element_precision (op_type))
    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))

I don't see anything wrong here.
Are we saying the (bit_not @0) part is causing issues?

Comment 7 Andrew Pinski 2021-08-27 02:32:22 UTC

The operation works elementwise if the operands are vectors.

No I think x86's cond_* patterns are not correct.

Comment 8 Andrew Pinski 2021-08-27 02:33:48 UTC

That is the mask is a vector mode still for these patterns according to the internals doc.
Rather than the scalar mode you have:
(match_operand:<avx512fmaskmode> 1 "register_operand")

Comment 9 Hongtao.liu 2021-08-27 02:42:47 UTC

(In reply to Andrew Pinski from comment #8)
> That is the mask is a vector mode still for these patterns according to the
> internals doc.
> Rather than the scalar mode you have:
> (match_operand:<avx512fmaskmode> 1 "register_operand")

No, according to doc, mode of operands[1] is decided by TARGET_VECTORIZE_GET_MASK_MODE. Which means integer mode shoud also be accepted.

Operands 0, 2, 3 and 4 all have mode m. Operand 1 is a scalar integer if m is
scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_
MODE.

Comment 10 Andrew Pinski 2021-08-27 02:52:52 UTC

(In reply to Hongtao.liu from comment #9)
> (In reply to Andrew Pinski from comment #8)
> > That is the mask is a vector mode still for these patterns according to the
> > internals doc.
> > Rather than the scalar mode you have:
> > (match_operand:<avx512fmaskmode> 1 "register_operand")
> 
> No, according to doc, mode of operands[1] is decided by
> TARGET_VECTORIZE_GET_MASK_MODE. Which means integer mode shoud also be
> accepted.
> 
> Operands 0, 2, 3 and 4 all have mode m. Operand 1 is a scalar integer if m is
> scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_
> MODE.

Looks like vectorized_internal_fn_supported_p might need to be extended here or a better function needs to be used to check on cond_* optabs.

Comment 11 Hongtao.liu 2021-08-27 05:17:10 UTC

Created attachment 51363 [details]
Proposed patch

I'm testing this patch.

Comment 12 Richard Biener 2021-08-27 07:02:19 UTC

(In reply to Hongtao.liu from comment #11)
> Created attachment 51363 [details]
> Proposed patch
> 
> I'm testing this patch.

This looks sensible - Richard, any opinion?

Comment 13 Martin Liška 2021-08-27 08:29:13 UTC

There's a reduced test-case:

#pragma GCC target "avx"
typedef float __m256 __attribute__((__vector_size__(32)));
__m256 _mm256_blendv_ps___Y, _mm256_blendv_ps___M, _mm256_mul_ps___A,
    _mm256_mul_ps___B, IfThenElse___trans_tmp_9;
struct Raw256 {
  using type = __m256;
};
struct Vec256 {
  using Raw = Raw256::type;
  Raw raw;
} UndoXYBInPlace_linear_g;
Vec256 IfThenElse(Vec256 no) {
  IfThenElse___trans_tmp_9 = __builtin_ia32_blendvps256(
      no.raw, _mm256_blendv_ps___Y, _mm256_blendv_ps___M);
  return {IfThenElse___trans_tmp_9};
}
Vec256 operator*(Vec256, Vec256) {
  __m256 __trans_tmp_11 = _mm256_mul_ps___A * _mm256_mul_ps___B;
  return {__trans_tmp_11};
}
struct TF_SRGB {
  template <class D, class V> void EncodedFromDisplay(D, V x) {
    Vec256 __trans_tmp_12, linear = x * __trans_tmp_12;
    IfThenElse(linear);
  }
};
#pragma GCC target                                                             \
    "avx,avx2,bmi,bmi2,fma,f16c,avx512f,avx512vl,avx512dq,avx512bw"
void UndoXYBInPlace_d() {
  TF_SRGB().EncodedFromDisplay(UndoXYBInPlace_d, UndoXYBInPlace_linear_g);
}

Comment 14 Martin Liška 2021-08-27 08:30:30 UTC

Even simpler test-case:

#pragma GCC target "avx"
typedef float __m256 __attribute__((__vector_size__(32)));
__m256 _mm256_blendv_ps___Y, _mm256_blendv_ps___M, _mm256_mul_ps___A,
    _mm256_mul_ps___B, IfThenElse___trans_tmp_9;
struct Raw256 {
  using type = __m256;
};
struct Vec256 {
  using Raw = Raw256::type;
  Raw raw;
} UndoXYBInPlace_linear_g;
Vec256 IfThenElse(Vec256 no) {
  IfThenElse___trans_tmp_9 = __builtin_ia32_blendvps256(
      no.raw, _mm256_blendv_ps___Y, _mm256_blendv_ps___M);
  return {IfThenElse___trans_tmp_9};
}
Vec256 operator*(Vec256, Vec256) {
  __m256 __trans_tmp_11 = _mm256_mul_ps___A * _mm256_mul_ps___B;
  return {__trans_tmp_11};
}
struct TF_SRGB {
  template <class D, class V> void EncodedFromDisplay(D, V x) {
    Vec256 __trans_tmp_12, linear = x * __trans_tmp_12;
    IfThenElse(linear);
  }
};
#pragma GCC target "avx512vl"
void UndoXYBInPlace_d() {
  TF_SRGB().EncodedFromDisplay(UndoXYBInPlace_d, UndoXYBInPlace_linear_g);
}

Comment 15 GCC Commits 2021-09-16 08:35:46 UTC

The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:a26ff83ed07e33c4aa46f3314553c0d15ca21100

commit r12-3569-ga26ff83ed07e33c4aa46f3314553c0d15ca21100
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Sep 2 13:05:54 2021 +0800

    Check mask type when doing cond_op related gimple simplification.
    
    gcc/ChangeLog:
    
            PR middle-end/102080
            * match.pd: Check mask type when doing cond_op related gimple
            simplification.
            * tree.c (is_truth_type_for): New function.
            * tree.h (is_truth_type_for): New declaration.
    
    gcc/testsuite/ChangeLog:
    
            PR middle-end/102080
            * gcc.target/i386/pr102080.c: New test.

Comment 16 Jakub Jelinek 2021-12-17 16:01:34 UTC

So fixed?

Comment 17 H.J. Lu 2021-12-17 16:07:41 UTC

Fixed.