Bug 77270 - Flag -mprftchw is shared with 3dnow for -march=k8
Summary: Flag -mprftchw is shared with 3dnow for -march=k8
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.1.0
: P3 normal
Target Milestone: 6.3
Assignee: Uroš Bizjak
URL: https://gcc.gnu.org/ml/gcc/2016-05/ms...
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-16 16:40 UTC by nightstrike
Modified: 2016-08-28 16:32 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail: 4.9.3, 6.1.0
Last reconfirmed: 2016-08-19 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description nightstrike 2016-08-16 16:40:30 UTC
See https://gcc.gnu.org/ml/gcc/2016-05/msg00000.html

Using -march=native on a k8 platform does not add -mprfchw to the options list, while using -march=k8 on a different platform does add the switch.

From Venkataramanan:

>       3DNow! instruction extensions         = true
>       3DNow! instructions                   = true

It has 3Dnow support.  "prefetchw" is available with 3dnow.

>       misaligned SSE mode                    = false
>       3DNow! PREFETCH/PREFETCHW instructions = false

It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct for -march=k8.

>       OS visible workaround                  = false
>       instruction based sampling             = false

I think  we need to file bug for this.  Need to check with Uros why the flag -mprfchw is shared with 3dnow.

To work around this issue you can use -mno-prfchw when building with -march=k8.
Comment 1 nightstrike 2016-08-16 17:42:19 UTC
This should trigger the prefetchw instruction with -march=k8 -O3:

void f() {
    extern int size;
    int i;

    float * fvec;
    float * fptr = (float *) get();
    for(i = 0; i < size; ++i)
        fvec[i] = fptr[i];

    int * ivec;
    int * iptr = (int *) get();
    for(i = 0; i < size; ++i)
        ivec[i] = iptr[i];
}


If I give an "extern void * get()" prototype for get, if there's only one vector loop, or if size is local, the prefetchw instruction goes away.

$ gcc -fverbose-asm -S a.c -march=k8 -O3; grep prefetchw a.s
	prefetchw	464(%rcx)	#
	prefetchw	(%rdx)	# ivtmp.37
Comment 2 Uroš Bizjak 2016-08-16 18:55:13 UTC
The problem is in the fact that for -march=native, the driver will pass -mno-prfchw, since the relevant bit is not present in cpuid flags.

Following code in i386.c:

  /* Enable prefetch{,w} instructions for -m3dnow and -mprefetchwt1.  */
  if (TARGET_3DNOW_P (opts->x_ix86_isa_flags)
      || TARGET_PREFETCHWT1_P (opts->x_ix86_isa_flags))
    opts->x_ix86_isa_flags
      |= OPTION_MASK_ISA_PRFCHW & ~opts->x_ix86_isa_flags_explicit;

will then try to set the option, but it won't succeed, due to the above handling of explicit compile flags.
Comment 3 Uroš Bizjak 2016-08-19 16:26:10 UTC
I have a patch.
Comment 4 uros 2016-08-19 18:14:35 UTC
Author: uros
Date: Fri Aug 19 18:14:03 2016
New Revision: 239626

URL: https://gcc.gnu.org/viewcvs?rev=239626&root=gcc&view=rev
Log:
	PR target/77270
	* config/i386/i386.c (ix86_option_override_internal): Remove
	PTA_PRFCHW from entries that also have PTA_3DNOW flag.
	Enable SSE prefetch also for TARGET_PREFETCHWT1.
	Do not try to enable TARGET_PRFCHW ISA flag here.
	* config/i386/i386.md (prefetch): Enable also for TARGET_3DNOW.
	Rewrite expander function body.
	(*prefetch_3dnow): Enable for TARGET_3DNOW and TARGET_PREFETCHWT1.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
Comment 5 nightstrike 2016-08-20 14:12:38 UTC
(In reply to Uroš Bizjak from comment #2)
> The problem is in the fact that for -march=native, the driver will pass
> -mno-prfchw, since the relevant bit is not present in cpuid flags.

I'm a little confused.  -march=native gets it right.  It's -march=k8 that gets it wrong.  Does your patch make -march-k8 do the right thing now?
Comment 6 Uroš Bizjak 2016-08-20 17:10:20 UTC
(In reply to nightstrike from comment #5)
> (In reply to Uroš Bizjak from comment #2)
> > The problem is in the fact that for -march=native, the driver will pass
> > -mno-prfchw, since the relevant bit is not present in cpuid flags.
> 
> I'm a little confused.  -march=native gets it right.  It's -march=k8 that
> gets it wrong.  Does your patch make -march-k8 do the right thing now?

As it is now, -march=native (on k8) and -march=k8 will get consistent results, that is - both will use 3DNOW prefetchw for *write* prefetches. If this is not OK, we can now change "prefetch" instruction pattern to always emit SSE prefetches instead of 3DNOW write prefetch for TARGET_SSE_PREFETCH targets, and emit prefetchw only for TARGET_PRFCHW.
Comment 7 nightstrike 2016-08-20 17:14:53 UTC
The analysis from Venkat here:

https://gcc.gnu.org/ml/gcc/2016-05/msg00012.html

seems to indicate that the instruction is not valid for k8.
Comment 8 vekumar 2016-08-21 07:30:41 UTC
There are 2 issues .

#issue1 
-mprfchw should be enabled only for targets that supports (3DNowprefetch).
On K8, 3DNowprefetch is not available and -march=k8 should not set this flag.

I can see behavior now corrected with Uros flag. 
Although I have to verify changes done other targets. 

#2 issue2
prefetchw ISA is also available in 3DNow!. Generating prefetchw by the GCC backend is functionally correct if write prefetches are requested.

Looking at the test case why write prefetches are requested.

void f() {
    extern int size;
    int i;
    float * fvec;
    float * fptr = (float *) get();
    for(i = 0; i < size; ++i)
        fvec[i] = fptr[i];
    get();
}
    
I have to keep one more call statement so that "fvec" definition is not killed. 
prefetchw is generated for memory stores via fvec.  They are written only.
Comment 9 uros 2016-08-21 18:54:21 UTC
Author: uros
Date: Sun Aug 21 18:53:48 2016
New Revision: 239643

URL: https://gcc.gnu.org/viewcvs?rev=239643&root=gcc&view=rev
Log:
	PR target/77270
	* config/i386/i386.md (prefetch): When TARGET_PRFCHW or
	TARGET_PREFETCHWT1 are disabled, emit 3dNOW! write prefetches for
	non-SSE2 athlons only, otherwise prefer SSE prefetches.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
Comment 10 uros 2016-08-24 15:00:20 UTC
Author: uros
Date: Wed Aug 24 14:59:43 2016
New Revision: 239737

URL: https://gcc.gnu.org/viewcvs?rev=239737&root=gcc&view=rev
Log:
	PR target/77270
	* gcc.dg/tree-ssa/loop-28.c: Also compile on 32bit x86 targets.
	(dg-options): Use -march=amdfam10 instead of -march=athlon.
	* gcc.dg/tree-ssa/update-unroll-1.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-3.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-4.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-5.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-6.c: Ditto.  Do not require sse2
	effective target.  Remove scan-assembler-times directives.
	* gcc.dg/tree-ssa/prefetch-7.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-8.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-9.c: Ditto.


Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/loop-28.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-4.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-5.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-6.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/update-unroll-1.c
Comment 11 uros 2016-08-28 16:31:04 UTC
Author: uros
Date: Sun Aug 28 16:30:32 2016
New Revision: 239810

URL: https://gcc.gnu.org/viewcvs?rev=239810&root=gcc&view=rev
Log:
	Backport from mainline
	2016-08-23  Venkataramanan Kumar  <venkataramanan.kumar@amd.com>

	* config/i386/i386.c (processor_alias_table): Enable PTA_PRFCHW
	for targets amdfam10 and barcelona.

	Backport from mainline
	2016-08-21  Uros Bizjak  <ubizjak@gmail.com>

	PR target/77270
	* config/i386/i386.md (prefetch): When TARGET_PRFCHW or
	TARGET_PREFETCHWT1 are disabled, emit 3dNOW! write prefetches for
	non-SSE2 athlons only, otherwise prefer SSE prefetches.

	Backport from mainline
	2016-08-19  Uros Bizjak  <ubizjak@gmail.com>

	PR target/77270
	* config/i386/i386.c (ix86_option_override_internal): Remove
	PTA_PRFCHW from entries that also have PTA_3DNOW flag.
	Enable SSE prefetch also for TARGET_PREFETCHWT1.
	Do not try to enable TARGET_PRFCHW ISA flag here.
	* config/i386/i386.md (prefetch): Enable also for TARGET_3DNOW.
	Rewrite expander function body.
	(*prefetch_3dnow): Enable for TARGET_3DNOW and TARGET_PREFETCHWT1.

testsuite/ChangeLog:

	Backport from mainline
	2016-08-24  Uros Bizjak  <ubizjak@gmail.com>

	PR target/77270
	* gcc.dg/tree-ssa/loop-28.c: Also compile on 32bit x86 targets.
	(dg-options): Use -march=amdfam10 instead of -march=athlon.
	* gcc.dg/tree-ssa/update-unroll-1.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-3.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-4.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-5.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-6.c: Ditto.  Do not require sse2
	effective target.  Remove scan-assembler-times directives.
	* gcc.dg/tree-ssa/prefetch-7.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-8.c: Ditto.
	* gcc.dg/tree-ssa/prefetch-9.c: Ditto.


Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/config/i386/i386.c
    branches/gcc-6-branch/gcc/config/i386/i386.md
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/loop-28.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-3.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-4.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-5.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-6.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/update-unroll-1.c
Comment 12 Uroš Bizjak 2016-08-28 16:32:27 UTC
Fixed for 6.3+.