See https://gcc.gnu.org/ml/gcc/2016-05/msg00000.html Using -march=native on a k8 platform does not add -mprfchw to the options list, while using -march=k8 on a different platform does add the switch. From Venkataramanan: > 3DNow! instruction extensions = true > 3DNow! instructions = true It has 3Dnow support. "prefetchw" is available with 3dnow. > misaligned SSE mode = false > 3DNow! PREFETCH/PREFETCHW instructions = false It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct for -march=k8. > OS visible workaround = false > instruction based sampling = false I think we need to file bug for this. Need to check with Uros why the flag -mprfchw is shared with 3dnow. To work around this issue you can use -mno-prfchw when building with -march=k8.
This should trigger the prefetchw instruction with -march=k8 -O3: void f() { extern int size; int i; float * fvec; float * fptr = (float *) get(); for(i = 0; i < size; ++i) fvec[i] = fptr[i]; int * ivec; int * iptr = (int *) get(); for(i = 0; i < size; ++i) ivec[i] = iptr[i]; } If I give an "extern void * get()" prototype for get, if there's only one vector loop, or if size is local, the prefetchw instruction goes away. $ gcc -fverbose-asm -S a.c -march=k8 -O3; grep prefetchw a.s prefetchw 464(%rcx) # prefetchw (%rdx) # ivtmp.37
The problem is in the fact that for -march=native, the driver will pass -mno-prfchw, since the relevant bit is not present in cpuid flags. Following code in i386.c: /* Enable prefetch{,w} instructions for -m3dnow and -mprefetchwt1. */ if (TARGET_3DNOW_P (opts->x_ix86_isa_flags) || TARGET_PREFETCHWT1_P (opts->x_ix86_isa_flags)) opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PRFCHW & ~opts->x_ix86_isa_flags_explicit; will then try to set the option, but it won't succeed, due to the above handling of explicit compile flags.
I have a patch.
Author: uros Date: Fri Aug 19 18:14:03 2016 New Revision: 239626 URL: https://gcc.gnu.org/viewcvs?rev=239626&root=gcc&view=rev Log: PR target/77270 * config/i386/i386.c (ix86_option_override_internal): Remove PTA_PRFCHW from entries that also have PTA_3DNOW flag. Enable SSE prefetch also for TARGET_PREFETCHWT1. Do not try to enable TARGET_PRFCHW ISA flag here. * config/i386/i386.md (prefetch): Enable also for TARGET_3DNOW. Rewrite expander function body. (*prefetch_3dnow): Enable for TARGET_3DNOW and TARGET_PREFETCHWT1. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.md
(In reply to Uroš Bizjak from comment #2) > The problem is in the fact that for -march=native, the driver will pass > -mno-prfchw, since the relevant bit is not present in cpuid flags. I'm a little confused. -march=native gets it right. It's -march=k8 that gets it wrong. Does your patch make -march-k8 do the right thing now?
(In reply to nightstrike from comment #5) > (In reply to Uroš Bizjak from comment #2) > > The problem is in the fact that for -march=native, the driver will pass > > -mno-prfchw, since the relevant bit is not present in cpuid flags. > > I'm a little confused. -march=native gets it right. It's -march=k8 that > gets it wrong. Does your patch make -march-k8 do the right thing now? As it is now, -march=native (on k8) and -march=k8 will get consistent results, that is - both will use 3DNOW prefetchw for *write* prefetches. If this is not OK, we can now change "prefetch" instruction pattern to always emit SSE prefetches instead of 3DNOW write prefetch for TARGET_SSE_PREFETCH targets, and emit prefetchw only for TARGET_PRFCHW.
The analysis from Venkat here: https://gcc.gnu.org/ml/gcc/2016-05/msg00012.html seems to indicate that the instruction is not valid for k8.
There are 2 issues . #issue1 -mprfchw should be enabled only for targets that supports (3DNowprefetch). On K8, 3DNowprefetch is not available and -march=k8 should not set this flag. I can see behavior now corrected with Uros flag. Although I have to verify changes done other targets. #2 issue2 prefetchw ISA is also available in 3DNow!. Generating prefetchw by the GCC backend is functionally correct if write prefetches are requested. Looking at the test case why write prefetches are requested. void f() { extern int size; int i; float * fvec; float * fptr = (float *) get(); for(i = 0; i < size; ++i) fvec[i] = fptr[i]; get(); } I have to keep one more call statement so that "fvec" definition is not killed. prefetchw is generated for memory stores via fvec. They are written only.
Author: uros Date: Sun Aug 21 18:53:48 2016 New Revision: 239643 URL: https://gcc.gnu.org/viewcvs?rev=239643&root=gcc&view=rev Log: PR target/77270 * config/i386/i386.md (prefetch): When TARGET_PRFCHW or TARGET_PREFETCHWT1 are disabled, emit 3dNOW! write prefetches for non-SSE2 athlons only, otherwise prefer SSE prefetches. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
Author: uros Date: Wed Aug 24 14:59:43 2016 New Revision: 239737 URL: https://gcc.gnu.org/viewcvs?rev=239737&root=gcc&view=rev Log: PR target/77270 * gcc.dg/tree-ssa/loop-28.c: Also compile on 32bit x86 targets. (dg-options): Use -march=amdfam10 instead of -march=athlon. * gcc.dg/tree-ssa/update-unroll-1.c: Ditto. * gcc.dg/tree-ssa/prefetch-3.c: Ditto. * gcc.dg/tree-ssa/prefetch-4.c: Ditto. * gcc.dg/tree-ssa/prefetch-5.c: Ditto. * gcc.dg/tree-ssa/prefetch-6.c: Ditto. Do not require sse2 effective target. Remove scan-assembler-times directives. * gcc.dg/tree-ssa/prefetch-7.c: Ditto. * gcc.dg/tree-ssa/prefetch-8.c: Ditto. * gcc.dg/tree-ssa/prefetch-9.c: Ditto. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/loop-28.c trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-3.c trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-5.c trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-6.c trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c trunk/gcc/testsuite/gcc.dg/tree-ssa/update-unroll-1.c
Author: uros Date: Sun Aug 28 16:30:32 2016 New Revision: 239810 URL: https://gcc.gnu.org/viewcvs?rev=239810&root=gcc&view=rev Log: Backport from mainline 2016-08-23 Venkataramanan Kumar <venkataramanan.kumar@amd.com> * config/i386/i386.c (processor_alias_table): Enable PTA_PRFCHW for targets amdfam10 and barcelona. Backport from mainline 2016-08-21 Uros Bizjak <ubizjak@gmail.com> PR target/77270 * config/i386/i386.md (prefetch): When TARGET_PRFCHW or TARGET_PREFETCHWT1 are disabled, emit 3dNOW! write prefetches for non-SSE2 athlons only, otherwise prefer SSE prefetches. Backport from mainline 2016-08-19 Uros Bizjak <ubizjak@gmail.com> PR target/77270 * config/i386/i386.c (ix86_option_override_internal): Remove PTA_PRFCHW from entries that also have PTA_3DNOW flag. Enable SSE prefetch also for TARGET_PREFETCHWT1. Do not try to enable TARGET_PRFCHW ISA flag here. * config/i386/i386.md (prefetch): Enable also for TARGET_3DNOW. Rewrite expander function body. (*prefetch_3dnow): Enable for TARGET_3DNOW and TARGET_PREFETCHWT1. testsuite/ChangeLog: Backport from mainline 2016-08-24 Uros Bizjak <ubizjak@gmail.com> PR target/77270 * gcc.dg/tree-ssa/loop-28.c: Also compile on 32bit x86 targets. (dg-options): Use -march=amdfam10 instead of -march=athlon. * gcc.dg/tree-ssa/update-unroll-1.c: Ditto. * gcc.dg/tree-ssa/prefetch-3.c: Ditto. * gcc.dg/tree-ssa/prefetch-4.c: Ditto. * gcc.dg/tree-ssa/prefetch-5.c: Ditto. * gcc.dg/tree-ssa/prefetch-6.c: Ditto. Do not require sse2 effective target. Remove scan-assembler-times directives. * gcc.dg/tree-ssa/prefetch-7.c: Ditto. * gcc.dg/tree-ssa/prefetch-8.c: Ditto. * gcc.dg/tree-ssa/prefetch-9.c: Ditto. Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/i386/i386.c branches/gcc-6-branch/gcc/config/i386/i386.md branches/gcc-6-branch/gcc/testsuite/ChangeLog branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/loop-28.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-3.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-4.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-5.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-6.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/update-unroll-1.c
Fixed for 6.3+.