This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
- From: "rguenther at suse dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 17 Feb 2016 12:13:05 +0000
- Subject: [Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
- Auto-submitted: auto-generated
- References: <bug-69671-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671
--- Comment #23 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 17 Feb 2016, jakub at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671
>
> --- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Created attachment 37722
> --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37722&action=edit
> gcc6-pr69671.patch
>
> Actually, on a closer look, I believe the only problem are the patterns that
> use a vector_move_operand "0C" inside of vec_select with only constants as the
> parallel's operands. Because fwprop is able to propagate constants into
> instructions (thus undo the CSE effect), but doesn't do anything on these,
> because it also simplifies them, so instead of the expected say
> (vec_select:V4QI (const_vector:V16QI [
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> ])
> (parallel [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> ]))
> we get in there simplified:
> (const_vector:V4QI [
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> ])
> So, by adding extra patterns for that simplification fwprop is able to do its
> job even if CSE did a better job.
Of course then I wonder why we didn't simplify this in the first place
when generating RTL and need to wait for forwprop ...
But yes, sounds like the easiest way to go forward.