Bug 85281 - [8 Regression] Assembler messages: Error: operand size mismatch for `vpbroadcastb' with -mavx512bw -masm=intel
Summary: [8 Regression] Assembler messages: Error: operand size mismatch for `vpbroadc...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0.1
: P1 normal
Target Milestone: 8.0
Assignee: Jakub Jelinek
URL:
Keywords: assemble-failure
Depends on:
Blocks:
 
Reported: 2018-04-07 18:47 UTC by Zdenek Sojka
Modified: 2018-04-17 07:08 UTC (History)
1 user (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build:
Known to work:
Known to fail: 8.0.1
Last reconfirmed: 2018-04-09 00:00:00


Attachments
reduced testcase (125 bytes, text/plain)
2018-04-07 18:47 UTC, Zdenek Sojka
Details
gcc8-pr85281.patch (959 bytes, patch)
2018-04-09 10:28 UTC, Jakub Jelinek
Details | Diff
gcc8-pr85281-assorted-fixes.patch (2.56 KB, patch)
2018-04-09 14:30 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2018-04-07 18:47:50 UTC
Created attachment 43875 [details]
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O -mavx512bw -masm=intel -w testcase.c -save-temps
testcase.s: Assembler messages:
testcase.s:32: Error: operand size mismatch for `vpbroadcastb'

The failing instruction is:
...
	vpbroadcastb	zmm0{k2}, XMMWORD PTR [rsp+48]
...
which is mixing zmm0 and XMMWORD.
This compiles fine with att syntax:
...
	vpbroadcastb	48(%rsp), %zmm0{%k2}
...

The gcc I am using is patched with the fix for PR85177#c1


7-branch is fine on this testcase, but it generates very different (inferior in my eyes) code.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64
Thread model: posix
gcc version 8.0.1 20180407 (experimental) (GCC)
Comment 1 Jakub Jelinek 2018-04-09 08:38:07 UTC
Looking into this.  Seems it isn't just vpbroadcast that has such problems, looking at i386.exp testsuite log with -masm=intel I see e.g.
gcc.target/i386/avx512dq-vreducesd-2.c
gcc.target/i386/avx512f-vfixupimmsd-2.c
gcc.target/i386/avx512f-vfixupimmss-2.c
gcc.target/i386/avx512f-vpmovsxbd-2.c
gcc.target/i386/avx512f-vpmovsxwq-2.c
gcc.target/i386/avx512f-vpmovzxbd-2.c
gcc.target/i386/avx512f-vpmovzxwq-2.c
gcc.target/i386/avx512f-vrndscalesd-2.c
gcc.target/i386/avx512f-vrndscaless-2.c
gcc.target/i386/avx512f-vscalefss-2.c
gcc.target/i386/avx512vl-vcvtudq2pd-2.c
gcc.target/i386/avx512vl-vpbroadcastb-2.c
gcc.target/i386/avx512vl-vpbroadcastw-2.c
gcc.target/i386/avx512vl-vpmovswb-2.c
gcc.target/i386/avx512vl-vpmovuswb-2.c
gcc.target/i386/avx512vl-vpmovwb-2.c
gcc.target/i386/avx512vl-vshufpd-2.c
tests to have similar assembler errors.  What exact *WORD PTR each instruction requires is a complete mess, and the question is whether it matches what other assemblers do.
Comment 2 Jakub Jelinek 2018-04-09 10:28:37 UTC
Created attachment 43883 [details]
gcc8-pr85281.patch

Untested fix for the broadcast issue.  Will look at other issues next.
Comment 3 Jakub Jelinek 2018-04-09 14:30:46 UTC
Created attachment 43885 [details]
gcc8-pr85281-assorted-fixes.patch

Other issues I've run into.  These two patches together with with RUNTESTFLAGS='--target_board=unix\{-m32/-masm=intel,-m64/-masm=intel\} i386.exp'
-FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512dq-vreducesd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512dq-vreducess-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtsd2usi-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtsd2usi64-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtss2usi-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vcvtss2usi64-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vfixupimmsd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vfixupimmss-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vcvtudq2pd-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovswb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx512vl-vshufpd-2.c (test for excess errors)
Comment 4 Jakub Jelinek 2018-04-11 11:37:33 UTC
Author: jakub
Date: Wed Apr 11 11:37:01 2018
New Revision: 259316

URL: https://gcc.gnu.org/viewcvs?rev=259316&root=gcc&view=rev
Log:
	PR target/85281
	* config/i386/sse.md (iptr): Add V16SFmode and V8DFmode cases.
	(<avx512>_vec_dup<mode><mask_name>): Use a single pattern for modes
	other than V2DFmode using iptr mode attribute.
	(<avx512>_vec_dup<mode><mask_name>): Use iptr mode attribute.

	* gcc.target/i386/pr85281.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr85281.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 5 Jakub Jelinek 2018-04-11 11:42:00 UTC
Fixed.  The assorted fixes patch is still pending, but that is not known to be a regression.
Comment 6 Jakub Jelinek 2018-04-17 07:08:37 UTC
Author: jakub
Date: Tue Apr 17 07:08:06 2018
New Revision: 259430

URL: https://gcc.gnu.org/viewcvs?rev=259430&root=gcc&view=rev
Log:
	PR target/85281
	* config/i386/sse.md (reduces<mode><mask_scalar_name>,
	avx512f_vmcmp<mode>3<round_saeonly_name>,
	avx512f_vmcmp<mode>3_mask<round_saeonly_name>,
	avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_scalar_name>,
	avx512f_rndscale<mode><round_saeonly_name>,
	avx512dq_ranges<mode><mask_scalar_name><round_saeonly_scalar_name>,
	avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>):
	Use %<iptr>2 instead of %2 for -masm=intel.
	(avx512f_vcvtss2usi<round_name>, avx512f_vcvtss2usiq<round_name>,
	avx512f_vcvttss2usi<round_saeonly_name>,
	avx512f_vcvttss2usiq<round_saeonly_name>): Use %k1 instead of %1 for
	-masm=intel.
	(avx512f_vcvtsd2usi<round_name>, avx512f_vcvtsd2usiq<round_name>,
	avx512f_vcvttsd2usi<round_saeonly_name>,
	avx512f_vcvttsd2usiq<round_saeonly_name>, ufloatv2siv2df2<mask_name>):
	Use %q1 instead of %1 for -masm=intel.
	(avx512f_sfixupimm<mode><sd_maskz_name><round_saeonly_name>,
	avx512f_sfixupimm<mode>_mask<round_saeonly_name>): Use %<iptr>3 instead
	of %3 for -masm=intel.
	(sse2_shufpd_v2df_mask): Fix a typo, change %{6%} to %{%6%} for
	-masm=intel.
	(*avx512vl_<code>v2div2qi2_store): Use %w0 instead of %0 for
	-masm=intel.
	(*avx512vl_<code><mode>v4qi2_store): Use %k0 instead of %0 for
	-masm=intel.
	(avx512vl_<code><mode>v4qi2_mask_store): Use a single pattern with
	%k0 and %1 for -masm=intel rather than two patterns, one with %0 and
	%g1.
	(*avx512vl_<code><mode>v8qi2_store): Use %q0 instead of %0 for
	-masm=intel.
	(avx512vl_<code><mode>v8qi2_mask_store): Use a single pattern with
	%q0 and %1 for -masm=intel rather than two patterns, one with %0 and
	%g1 and one with %0 and %1.
	(avx512er_vmrcp28<mode><round_saeonly_name>,
	avx512er_vmrsqrt28<mode><round_saeonly_name>): Use %<iptr>1 instead of
	%1 for -masm=intel.
	(avx5124fmaddps_4fmaddps_mask, avx5124fmaddps_4fmaddss_mask,
	avx5124fmaddps_4fnmaddps_mask, avx5124fmaddps_4fnmaddss_mask,
	avx5124vnniw_vp4dpwssd_mask, avx5124vnniw_vp4dpwssds_mask): Swap order
	of %0 and %{%4%} for -masm=intel.
	(avx5124fmaddps_4fmaddps_maskz, avx5124fmaddps_4fmaddss_maskz,
	avx5124fmaddps_4fnmaddps_maskz, avx5124fmaddps_4fnmaddss_maskz,
	avx5124vnniw_vp4dpwssd_maskz, avx5124vnniw_vp4dpwssds_maskz): Swap
	order of %0 and %{%5%}%{z%} for -masm=intel.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md