Created attachment 43875 [details] reduced testcase Compiler output: $ x86_64-pc-linux-gnu-gcc -O -mavx512bw -masm=intel -w testcase.c -save-temps testcase.s: Assembler messages: testcase.s:32: Error: operand size mismatch for `vpbroadcastb' The failing instruction is: ... vpbroadcastb zmm0{k2}, XMMWORD PTR [rsp+48] ... which is mixing zmm0 and XMMWORD. This compiles fine with att syntax: ... vpbroadcastb 48(%rsp), %zmm0{%k2} ... The gcc I am using is patched with the fix for PR85177#c1 7-branch is fine on this testcase, but it generates very different (inferior in my eyes) code. $ x86_64-pc-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64 Thread model: posix gcc version 8.0.1 20180407 (experimental) (GCC)
Looking into this. Seems it isn't just vpbroadcast that has such problems, looking at i386.exp testsuite log with -masm=intel I see e.g. gcc.target/i386/avx512dq-vreducesd-2.c gcc.target/i386/avx512f-vfixupimmsd-2.c gcc.target/i386/avx512f-vfixupimmss-2.c gcc.target/i386/avx512f-vpmovsxbd-2.c gcc.target/i386/avx512f-vpmovsxwq-2.c gcc.target/i386/avx512f-vpmovzxbd-2.c gcc.target/i386/avx512f-vpmovzxwq-2.c gcc.target/i386/avx512f-vrndscalesd-2.c gcc.target/i386/avx512f-vrndscaless-2.c gcc.target/i386/avx512f-vscalefss-2.c gcc.target/i386/avx512vl-vcvtudq2pd-2.c gcc.target/i386/avx512vl-vpbroadcastb-2.c gcc.target/i386/avx512vl-vpbroadcastw-2.c gcc.target/i386/avx512vl-vpmovswb-2.c gcc.target/i386/avx512vl-vpmovuswb-2.c gcc.target/i386/avx512vl-vpmovwb-2.c gcc.target/i386/avx512vl-vshufpd-2.c tests to have similar assembler errors. What exact *WORD PTR each instruction requires is a complete mess, and the question is whether it matches what other assemblers do.
Created attachment 43883 [details] gcc8-pr85281.patch Untested fix for the broadcast issue. Will look at other issues next.
Created attachment 43885 [details] gcc8-pr85281-assorted-fixes.patch Other issues I've run into. These two patches together with with RUNTESTFLAGS='--target_board=unix\{-m32/-masm=intel,-m64/-masm=intel\} i386.exp' -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512dq-vreducesd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512dq-vreducess-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtsd2usi-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtsd2usi64-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtss2usi-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vcvtss2usi64-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vfixupimmsd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vfixupimmss-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vcvtudq2pd-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vpmovswb-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vpmovwb-2.c (test for excess errors) -FAIL: gcc.target/i386/avx512vl-vshufpd-2.c (test for excess errors)
Author: jakub Date: Wed Apr 11 11:37:01 2018 New Revision: 259316 URL: https://gcc.gnu.org/viewcvs?rev=259316&root=gcc&view=rev Log: PR target/85281 * config/i386/sse.md (iptr): Add V16SFmode and V8DFmode cases. (<avx512>_vec_dup<mode><mask_name>): Use a single pattern for modes other than V2DFmode using iptr mode attribute. (<avx512>_vec_dup<mode><mask_name>): Use iptr mode attribute. * gcc.target/i386/pr85281.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr85281.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Fixed. The assorted fixes patch is still pending, but that is not known to be a regression.
Author: jakub Date: Tue Apr 17 07:08:06 2018 New Revision: 259430 URL: https://gcc.gnu.org/viewcvs?rev=259430&root=gcc&view=rev Log: PR target/85281 * config/i386/sse.md (reduces<mode><mask_scalar_name>, avx512f_vmcmp<mode>3<round_saeonly_name>, avx512f_vmcmp<mode>3_mask<round_saeonly_name>, avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_scalar_name>, avx512f_rndscale<mode><round_saeonly_name>, avx512dq_ranges<mode><mask_scalar_name><round_saeonly_scalar_name>, avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>): Use %<iptr>2 instead of %2 for -masm=intel. (avx512f_vcvtss2usi<round_name>, avx512f_vcvtss2usiq<round_name>, avx512f_vcvttss2usi<round_saeonly_name>, avx512f_vcvttss2usiq<round_saeonly_name>): Use %k1 instead of %1 for -masm=intel. (avx512f_vcvtsd2usi<round_name>, avx512f_vcvtsd2usiq<round_name>, avx512f_vcvttsd2usi<round_saeonly_name>, avx512f_vcvttsd2usiq<round_saeonly_name>, ufloatv2siv2df2<mask_name>): Use %q1 instead of %1 for -masm=intel. (avx512f_sfixupimm<mode><sd_maskz_name><round_saeonly_name>, avx512f_sfixupimm<mode>_mask<round_saeonly_name>): Use %<iptr>3 instead of %3 for -masm=intel. (sse2_shufpd_v2df_mask): Fix a typo, change %{6%} to %{%6%} for -masm=intel. (*avx512vl_<code>v2div2qi2_store): Use %w0 instead of %0 for -masm=intel. (*avx512vl_<code><mode>v4qi2_store): Use %k0 instead of %0 for -masm=intel. (avx512vl_<code><mode>v4qi2_mask_store): Use a single pattern with %k0 and %1 for -masm=intel rather than two patterns, one with %0 and %g1. (*avx512vl_<code><mode>v8qi2_store): Use %q0 instead of %0 for -masm=intel. (avx512vl_<code><mode>v8qi2_mask_store): Use a single pattern with %q0 and %1 for -masm=intel rather than two patterns, one with %0 and %g1 and one with %0 and %1. (avx512er_vmrcp28<mode><round_saeonly_name>, avx512er_vmrsqrt28<mode><round_saeonly_name>): Use %<iptr>1 instead of %1 for -masm=intel. (avx5124fmaddps_4fmaddps_mask, avx5124fmaddps_4fmaddss_mask, avx5124fmaddps_4fnmaddps_mask, avx5124fmaddps_4fnmaddss_mask, avx5124vnniw_vp4dpwssd_mask, avx5124vnniw_vp4dpwssds_mask): Swap order of %0 and %{%4%} for -masm=intel. (avx5124fmaddps_4fmaddps_maskz, avx5124fmaddps_4fmaddss_maskz, avx5124fmaddps_4fnmaddps_maskz, avx5124fmaddps_4fnmaddss_maskz, avx5124vnniw_vp4dpwssd_maskz, avx5124vnniw_vp4dpwssds_maskz): Swap order of %0 and %{%5%}%{z%} for -masm=intel. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md