This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: option -mprfchw on 2 different Opteron cpus
- From: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
- To: NightStrike <nightstrike at gmail dot com>, "Uros Bizjak (ubizjak at gmail dot com)" <ubizjak at gmail dot com>, "lopezibanez at gmail dot com" <lopezibanez at gmail dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, Jakub Jelinek <jakub at redhat dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Mon, 2 May 2016 09:55:10 +0000
- Subject: RE: option -mprfchw on 2 different Opteron cpus
- Authentication-results: sourceware.org; auth=none
- Authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=amd.com;
- References: <CAF1jjLsyTdZhRj=3C56uxFgPmEefJ3vvJu8EdnKGPnxHrH_RjQ at mail dot gmail dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Hi,
> -----Original Message-----
> From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of
> NightStrike
> Sent: Monday, May 2, 2016 1:55 AM
> To: gcc@gcc.gnu.org
> Cc: Jan Hubicka <hubicka@ucw.cz>; Jakub Jelinek <jakub@redhat.com>
> Subject: option -mprfchw on 2 different Opteron cpus
>
> Reposting from here:
> https://gcc.gnu.org/ml/gcc-help/2016-05/msg00003.html
>
> Not sure if this applies:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54210
>
> If I compile on a k8 Opteron 248 with -march=native, I do not see -mprfchw
> listed in the options in -fverbose-asm. In the assembly, I see this:
>
> prefetcht0 (%rax) # ivtmp.1160
> prefetcht0 304(%rcx) #
> prefetcht0 (%rax) # ivtmp.1160
In AMD processors -mprfchw flag is used to enable "3dnowprefetch" ISA support.
(Snip)
CPUID Fn8000_0001_ECX Feature Identifiers
Bit 8
3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See âPREFETCHâ and
âPREFETCHWâ in APM3
Ref: http://support.amd.com/TechDocs/25481.pdf
(Snip)
Can you please confirm what this CPUID flag returns on your k8 machine ?.
I believe this ISA is not available on k8 machine so when -march=native is added you donât see -mprfchw in verbose.
>
> If I compile on a bdver2 Opteron 6386 SE with -march=k8 (thus trying to
> target the older system), I do see it listed in the options in -fverbose-asm. In
> the assembly, I see this:
K8 has 3dnow support and there is a patch that replaced 3dnow with prefetchw (3DNowPrefetch).
https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html
So when you add -march=k8 you see -mprfchw getting listed in verbose.
>
> prefetcht0 (%rax) # ivtmp.1160
> prefetcht0 304(%rcx) #
> prefetchw (%rax) # ivtmp.1160
>
> (The third line is the only difference)
>
This is my guess without seeing the test case, when write prefetching is requested "prefetchw" is generated.
3dnow (TARGET_3DNOW) ISA has support for it.
(Snip)
Support for the PREFETCH and PREFETCHW instructions is indicated by CPUID
Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR
Fn8000_0001_EDX[3DNow] = 1.
(Snip)
Ref: http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf
> In both cases, I'm using gcc 4.9.3. Which is correct for a k8 Opteron 248?
>
> Also, FWIW:
>
> 1) The march=native version that uses prefetcht0 is very repeatably faster by
> about 15% in the particular test case I'm looking at.
>
> 2) The compilers in both instances are not just the same version, they are the
> same compiler binary installed on an NFS mount and shared to both
> computers.
As per GCC4.9.3 source.
(Snip)
(define_expand "prefetch"
[(prefetch (match_operand 0 "address_operand")
(match_operand:SI 1 "const_int_operand")
(match_operand:SI 2 "const_int_operand"))]
"TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
{
bool write = INTVAL (operands[1]) != 0;
int locality = INTVAL (operands[2]);
gcc_assert (IN_RANGE (locality, 0, 3));
/* Use 3dNOW prefetch in case we are asking for write prefetch not
supported by SSE counterpart or the SSE prefetch is not available
(K6 machines). Otherwise use SSE prefetch as it allows specifying
of locality. */
if (TARGET_PREFETCHWT1 && write && locality <= 2)
operands[2] = const2_rtx;
else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
operands[2] = GEN_INT (3);
else
operands[1] = const0_rtx;
})
(Snip)
Write prefetch may be requested (either by auto prefetcher or builtins) but on -march=native, the below check could have become false.
else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
TARGET_PRFCHW is off on native.
So there are two issues here.
(1) ISA flags enabled with -march=k8 is different from -march=native on k8 machine.
(2) Need to check why GCC middle end requested write prefetch for the test case with -march=k8 .
Regards,
Venkat.