85281 – [8 Regression] Assembler messages: Error: operand size mismatch for `vpbroadcastb' with -mavx512bw -masm=intel

Bug 85281 - [8 Regression] Assembler messages: Error: operand size mismatch for `vpbroadcastb' with -mavx512bw -masm=intel

Summary: [8 Regression] Assembler messages: Error: operand size mismatch for `vpbroadc...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	8.0.1

Importance:	P1 normal
Target Milestone:	8.0
Assignee:	Jakub Jelinek

URL:
Keywords:	assemble-failure

Depends on:
Blocks:

Reported:	2018-04-07 18:47 UTC by Zdenek Sojka
Modified:	2018-04-17 07:08 UTC (History)
CC List:	1 user (show)

See Also:
Host:	x86_64-pc-linux-gnu
Target:	x86_64-pc-linux-gnu
Build:
Known to work:
Known to fail:	8.0.1
Last reconfirmed:	2018-04-09 00:00:00

Attachments
reduced testcase (125 bytes, text/plain) 2018-04-07 18:47 UTC, Zdenek Sojka	Details
gcc8-pr85281.patch (959 bytes, patch) 2018-04-09 10:28 UTC, Jakub Jelinek	Details \| Diff
gcc8-pr85281-assorted-fixes.patch (2.56 KB, patch) 2018-04-09 14:30 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zdenek Sojka 2018-04-07 18:47:50 UTC

Created attachment 43875 [details]
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O -mavx512bw -masm=intel -w testcase.c -save-temps
testcase.s: Assembler messages:
testcase.s:32: Error: operand size mismatch for `vpbroadcastb'

The failing instruction is:
...
	vpbroadcastb	zmm0{k2}, XMMWORD PTR [rsp+48]
...
which is mixing zmm0 and XMMWORD.
This compiles fine with att syntax:
...
	vpbroadcastb	48(%rsp), %zmm0{%k2}
...

The gcc I am using is patched with the fix for PR85177#c1


7-branch is fine on this testcase, but it generates very different (inferior in my eyes) code.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64
Thread model: posix
gcc version 8.0.1 20180407 (experimental) (GCC)

Comment 1 Jakub Jelinek 2018-04-09 08:38:07 UTC

Looking into this.  Seems it isn't just vpbroadcast that has such problems, looking at i386.exp testsuite log with -masm=intel I see e.g.
gcc.target/i386/avx512dq-vreducesd-2.c
gcc.target/i386/avx512f-vfixupimmsd-2.c
gcc.target/i386/avx512f-vfixupimmss-2.c
gcc.target/i386/avx512f-vpmovsxbd-2.c
gcc.target/i386/avx512f-vpmovsxwq-2.c
gcc.target/i386/avx512f-vpmovzxbd-2.c
gcc.target/i386/avx512f-vpmovzxwq-2.c
gcc.target/i386/avx512f-vrndscalesd-2.c
gcc.target/i386/avx512f-vrndscaless-2.c
gcc.target/i386/avx512f-vscalefss-2.c
gcc.target/i386/avx512vl-vcvtudq2pd-2.c
gcc.target/i386/avx512vl-vpbroadcastb-2.c
gcc.target/i386/avx512vl-vpbroadcastw-2.c
gcc.target/i386/avx512vl-vpmovswb-2.c
gcc.target/i386/avx512vl-vpmovuswb-2.c
gcc.target/i386/avx512vl-vpmovwb-2.c
gcc.target/i386/avx512vl-vshufpd-2.c
tests to have similar assembler errors.  What exact *WORD PTR each instruction requires is a complete mess, and the question is whether it matches what other assemblers do.

Comment 2 Jakub Jelinek 2018-04-09 10:28:37 UTC

Created attachment 43883 [details]
gcc8-pr85281.patch

Untested fix for the broadcast issue.  Will look at other issues next.

Comment 3 Jakub Jelinek 2018-04-09 14:30:46 UTC

Created attachment 43885 [details]
gcc8-pr85281-assorted-fixes.patch

Other issues I've run into.  These two patches together with with RUNTESTFLAGS='--target_board=unix\{-m32/-masm=intel,-m64/-masm=intel\} i386.exp'
-FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512dq-vreducesd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512dq-vreducess-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtsd2usi-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtsd2usi64-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtss2usi-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtss2usi64-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vfixupimmsd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vfixupimmss-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vcvtudq2pd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovswb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vshufpd-2.c (test for excess errors)

Comment 4 Jakub Jelinek 2018-04-11 11:37:33 UTC

Author: jakub
Date: Wed Apr 11 11:37:01 2018
New Revision: 259316

URL: https://gcc.gnu.org/viewcvs?rev=259316&root=gcc&view=rev
Log:
	PR target/85281
	* config/i386/sse.md (iptr): Add V16SFmode and V8DFmode cases.
	(<avx512>_vec_dup<mode><mask_name>): Use a single pattern for modes
	other than V2DFmode using iptr mode attribute.
	(<avx512>_vec_dup<mode><mask_name>): Use iptr mode attribute.

	* gcc.target/i386/pr85281.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr85281.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Comment 5 Jakub Jelinek 2018-04-11 11:42:00 UTC

Fixed.  The assorted fixes patch is still pending, but that is not known to be a regression.

Comment 6 Jakub Jelinek 2018-04-17 07:08:37 UTC

Author: jakub
Date: Tue Apr 17 07:08:06 2018
New Revision: 259430

URL: https://gcc.gnu.org/viewcvs?rev=259430&root=gcc&view=rev
Log:
	PR target/85281
	* config/i386/sse.md (reduces<mode><mask_scalar_name>,
	avx512f_vmcmp<mode>3<round_saeonly_name>,
	avx512f_vmcmp<mode>3_mask<round_saeonly_name>,
	avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_scalar_name>,
	avx512f_rndscale<mode><round_saeonly_name>,
	avx512dq_ranges<mode><mask_scalar_name><round_saeonly_scalar_name>,
	avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>):
	Use %<iptr>2 instead of %2 for -masm=intel.
	(avx512f_vcvtss2usi<round_name>, avx512f_vcvtss2usiq<round_name>,
	avx512f_vcvttss2usi<round_saeonly_name>,
	avx512f_vcvttss2usiq<round_saeonly_name>): Use %k1 instead of %1 for
	-masm=intel.
	(avx512f_vcvtsd2usi<round_name>, avx512f_vcvtsd2usiq<round_name>,
	avx512f_vcvttsd2usi<round_saeonly_name>,
	avx512f_vcvttsd2usiq<round_saeonly_name>, ufloatv2siv2df2<mask_name>):
	Use %q1 instead of %1 for -masm=intel.
	(avx512f_sfixupimm<mode><sd_maskz_name><round_saeonly_name>,
	avx512f_sfixupimm<mode>_mask<round_saeonly_name>): Use %<iptr>3 instead
	of %3 for -masm=intel.
	(sse2_shufpd_v2df_mask): Fix a typo, change %{6%} to %{%6%} for
	-masm=intel.
	(*avx512vl_<code>v2div2qi2_store): Use %w0 instead of %0 for
	-masm=intel.
	(*avx512vl_<code><mode>v4qi2_store): Use %k0 instead of %0 for
	-masm=intel.
	(avx512vl_<code><mode>v4qi2_mask_store): Use a single pattern with
	%k0 and %1 for -masm=intel rather than two patterns, one with %0 and
	%g1.
	(*avx512vl_<code><mode>v8qi2_store): Use %q0 instead of %0 for
	-masm=intel.
	(avx512vl_<code><mode>v8qi2_mask_store): Use a single pattern with
	%q0 and %1 for -masm=intel rather than two patterns, one with %0 and
	%g1 and one with %0 and %1.
	(avx512er_vmrcp28<mode><round_saeonly_name>,
	avx512er_vmrsqrt28<mode><round_saeonly_name>): Use %<iptr>1 instead of
	%1 for -masm=intel.
	(avx5124fmaddps_4fmaddps_mask, avx5124fmaddps_4fmaddss_mask,
	avx5124fmaddps_4fnmaddps_mask, avx5124fmaddps_4fnmaddss_mask,
	avx5124vnniw_vp4dpwssd_mask, avx5124vnniw_vp4dpwssds_mask): Swap order
	of %0 and %{%4%} for -masm=intel.
	(avx5124fmaddps_4fmaddps_maskz, avx5124fmaddps_4fmaddss_maskz,
	avx5124fmaddps_4fnmaddps_maskz, avx5124fmaddps_4fnmaddss_maskz,
	avx5124vnniw_vp4dpwssd_maskz, avx5124vnniw_vp4dpwssds_maskz): Swap
	order of %0 and %{%5%}%{z%} for -masm=intel.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md