This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: option -mprfchw on 2 different Opteron cpus
- From: NightStrike <nightstrike at gmail dot com>
- To: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
- Cc: "Uros Bizjak (ubizjak at gmail dot com)" <ubizjak at gmail dot com>, "lopezibanez at gmail dot com" <lopezibanez at gmail dot com>, Jan Hubicka <hubicka at ucw dot cz>, Jakub Jelinek <jakub at redhat dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Tue, 16 Aug 2016 12:42:52 -0400
- Subject: Re: option -mprfchw on 2 different Opteron cpus
- Authentication-results: sourceware.org; auth=none
- References: <CAF1jjLsyTdZhRj=3C56uxFgPmEefJ3vvJu8EdnKGPnxHrH_RjQ@mail.gmail.com> <CY1PR1201MB1098DD32228B401ABC8DDDB18F790@CY1PR1201MB1098.namprd12.prod.outlook.com> <CAF1jjLvYE5p+sDcdhyMtQ5PzBC_K_Sv+rc-5Zzd=kiYwTG2bjA@mail.gmail.com> <CY1PR1201MB10986BAADD7DF937BAD1B98F8F7A0@CY1PR1201MB1098.namprd12.prod.outlook.com>
On Tue, May 3, 2016 at 12:40 AM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi
>
>> -----Original Message-----
>> From: NightStrike [mailto:nightstrike@gmail.com]
>> Sent: Monday, May 2, 2016 10:31 PM
>> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> Cc: Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>;
>> lopezibanez@gmail.com; Jan Hubicka <hubicka@ucw.cz>; Jakub Jelinek
>> <jakub@redhat.com>; gcc@gcc.gnu.org
>> Subject: Re: option -mprfchw on 2 different Opteron cpus
>>
>> On Mon, May 2, 2016 at 5:55 AM, Kumar, Venkataramanan
>> <Venkataramanan.Kumar@amd.com> wrote:
>> >> If I compile on a k8 Opteron 248 with -march=native, I do not see
>> >> -mprfchw listed in the options in -fverbose-asm. In the assembly, I see
>> this:
>> >>
>> >> prefetcht0 (%rax) # ivtmp.1160
>> >> prefetcht0 304(%rcx) #
>> >> prefetcht0 (%rax) # ivtmp.1160
>> >
>> > In AMD processors -mprfchw flag is used to enable "3dnowprefetch" ISA
>> support.
>> >
>> > (Snip)
>> > CPUID Fn8000_0001_ECX Feature Identifiers Bit 8
>> > 3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See
>> > “PREFETCH” and “PREFETCHW” in APM3
>> > Ref: http://support.amd.com/TechDocs/25481.pdf
>> > (Snip)
>> >
>> > Can you please confirm what this CPUID flag returns on your k8 machine ?.
>> > I believe this ISA is not available on k8 machine so when -march=native is
>> added you don’t see -mprfchw in verbose.
>>
>> Looks like zero? This was generated with the cpuid program from
>> http://www.etallen.com/cpuid.html
>>
>> 3DNow! instruction extensions = true
>> 3DNow! instructions = true
>
> It has 3Dnow support. "prefetchw" is available with 3dnow.
>
>> misaligned SSE mode = false
>> 3DNow! PREFETCH/PREFETCHW instructions = false
>
> It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct for -march=k8.
>
>> OS visible workaround = false
>> instruction based sampling = false
>> >> If I compile on a bdver2 Opteron 6386 SE with -march=k8 (thus trying
>> >> to target the older system), I do see it listed in the options in
>> >> -fverbose-asm. In the assembly, I see this:
>> >
>> > K8 has 3dnow support and there is a patch that replaced 3dnow with
>> prefetchw (3DNowPrefetch).
>> > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html
>> > So when you add -march=k8 you see -mprfchw getting listed in verbose.
>> >
>> >>
>> >> prefetcht0 (%rax) # ivtmp.1160
>> >> prefetcht0 304(%rcx) #
>> >> prefetchw (%rax) # ivtmp.1160
>> >>
>> >> (The third line is the only difference)
>> >>
>> >
>> > This is my guess without seeing the test case, when write prefetching is
>> requested "prefetchw" is generated.
>> > 3dnow (TARGET_3DNOW) ISA has support for it.
>> >
>> > (Snip)
>> > Support for the PREFETCH and PREFETCHW instructions is indicated by
>> > CPUID Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR
>> > Fn8000_0001_EDX[3DNow] = 1.
>> > (Snip)
>> > Ref:
>> http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf
>> >
>> >> In both cases, I'm using gcc 4.9.3. Which is correct for a k8 Opteron 248?
>> >>
>> >> Also, FWIW:
>> >>
>> >> 1) The march=native version that uses prefetcht0 is very repeatably
>> >> faster by about 15% in the particular test case I'm looking at.
>> >>
>> >> 2) The compilers in both instances are not just the same version,
>> >> they are the same compiler binary installed on an NFS mount and
>> >> shared to both computers.
>> >
>> > As per GCC4.9.3 source.
>> >
>> > (Snip)
>> > (define_expand "prefetch"
>> > [(prefetch (match_operand 0 "address_operand")
>> > (match_operand:SI 1 "const_int_operand")
>> > (match_operand:SI 2 "const_int_operand"))]
>> > "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
>> > {
>> > bool write = INTVAL (operands[1]) != 0;
>> > int locality = INTVAL (operands[2]);
>> >
>> > gcc_assert (IN_RANGE (locality, 0, 3));
>> >
>> > /* Use 3dNOW prefetch in case we are asking for write prefetch not
>> > supported by SSE counterpart or the SSE prefetch is not available
>> > (K6 machines). Otherwise use SSE prefetch as it allows specifying
>> > of locality. */
>> > if (TARGET_PREFETCHWT1 && write && locality <= 2)
>> > operands[2] = const2_rtx;
>> > else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
>> > operands[2] = GEN_INT (3);
>> > else
>> > operands[1] = const0_rtx;
>> > })
>> > (Snip)
>> >
>> > Write prefetch may be requested (either by auto prefetcher or builtins) but
>> on -march=native, the below check could have become false.
>> > else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
>> > TARGET_PRFCHW is off on native.
>> >
>> > So there are two issues here.
>> >
>> > (1) ISA flags enabled with -march=k8 is different from -march=native on k8
>> machine.
>
> I think we need to file bug for this. Need to check with Uros why the flag -mprfchw is shared with 3dnow.
> To work around this issue you can use -mno-prfchw when building with -march=k8.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77270
>> > (2) Need to check why GCC middle end requested write prefetch for the
>> test case with -march=k8 .
> On "prefetchw" generation it may be the case that GCC auto prefetcher requests write prefetches.
> AFAIK generating write prefetches brings data from memory and marks the catch line modified and expects a write to happen next.
> If read happens to that cache line instead then data will be written back to memory before read which will be unnecessary.
> Hard to answer without test case and I don’t have a ready k8 machine with me.
Should this be another bug filed if I can get a reduced test case, or
is PR77270 enough, or is this not a bug?