This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: option -mprfchw on 2 different Opteron cpus


On Tue, May 3, 2016 at 12:40 AM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi
>
>> -----Original Message-----
>> From: NightStrike [mailto:nightstrike@gmail.com]
>> Sent: Monday, May 2, 2016 10:31 PM
>> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
>> Cc: Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>;
>> lopezibanez@gmail.com; Jan Hubicka <hubicka@ucw.cz>; Jakub Jelinek
>> <jakub@redhat.com>; gcc@gcc.gnu.org
>> Subject: Re: option -mprfchw on 2 different Opteron cpus
>>
>> On Mon, May 2, 2016 at 5:55 AM, Kumar, Venkataramanan
>> <Venkataramanan.Kumar@amd.com> wrote:
>> >> If I compile on a k8 Opteron 248 with -march=native, I do not see
>> >> -mprfchw listed in the options in -fverbose-asm.  In the assembly, I see
>> this:
>> >>
>> >> prefetcht0      (%rax)  # ivtmp.1160
>> >> prefetcht0      304(%rcx)       #
>> >> prefetcht0      (%rax)  # ivtmp.1160
>> >
>> > In AMD processors -mprfchw flag  is used to enable "3dnowprefetch" ISA
>> support.
>> >
>> > (Snip)
>> > CPUID Fn8000_0001_ECX Feature Identifiers Bit 8
>> > 3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See
>> > “PREFETCH” and “PREFETCHW” in APM3
>> > Ref: http://support.amd.com/TechDocs/25481.pdf
>> > (Snip)
>> >
>> > Can you please confirm what this CPUID flag returns on your k8 machine ?.
>> > I believe this ISA is not available on k8 machine so when -march=native is
>> added you don’t see  -mprfchw in verbose.
>>
>> Looks like zero?  This was generated with the cpuid program from
>> http://www.etallen.com/cpuid.html
>>
>>       3DNow! instruction extensions         = true
>>       3DNow! instructions                   = true
>
> It has 3Dnow support.  "prefetchw" is available with 3dnow.
>
>>       misaligned SSE mode                    = false
>>       3DNow! PREFETCH/PREFETCHW instructions = false
>
> It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct for -march=k8.
>
>>       OS visible workaround                  = false
>>       instruction based sampling             = false
>> >> If I compile on a bdver2 Opteron 6386 SE with -march=k8 (thus trying
>> >> to target the older system), I do see it listed in the options in
>> >> -fverbose-asm.  In the assembly, I see this:
>> >
>> > K8 has 3dnow support and there is a patch that replaced 3dnow with
>> prefetchw (3DNowPrefetch).
>> > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html
>> > So when you add -march=k8 you see -mprfchw  getting listed in verbose.
>> >
>> >>
>> >> prefetcht0      (%rax)  # ivtmp.1160
>> >> prefetcht0      304(%rcx)       #
>> >> prefetchw       (%rax)  # ivtmp.1160
>> >>
>> >> (The third line is the only difference)
>> >>
>> >
>> > This is my guess without seeing the test case, when write  prefetching is
>> requested "prefetchw" is generated.
>> > 3dnow (TARGET_3DNOW) ISA has support for it.
>> >
>> > (Snip)
>> > Support for the PREFETCH and PREFETCHW instructions is indicated by
>> > CPUID Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR
>> > Fn8000_0001_EDX[3DNow] = 1.
>> > (Snip)
>> > Ref:
>> http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf
>> >
>> >> In both cases, I'm using gcc 4.9.3.  Which is correct for a k8 Opteron 248?
>> >>
>> >> Also, FWIW:
>> >>
>> >> 1) The march=native version that uses prefetcht0 is very repeatably
>> >> faster by about 15% in the particular test case I'm looking at.
>> >>
>> >> 2) The compilers in both instances are not just the same version,
>> >> they are the same compiler binary installed on an NFS mount and
>> >> shared to both computers.
>> >
>> > As per GCC4.9.3 source.
>> >
>> > (Snip)
>> > (define_expand "prefetch"
>> >   [(prefetch (match_operand 0 "address_operand")
>> >              (match_operand:SI 1 "const_int_operand")
>> >              (match_operand:SI 2 "const_int_operand"))]
>> >   "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
>> > {
>> >   bool write = INTVAL (operands[1]) != 0;
>> >   int locality = INTVAL (operands[2]);
>> >
>> >   gcc_assert (IN_RANGE (locality, 0, 3));
>> >
>> >   /* Use 3dNOW prefetch in case we are asking for write prefetch not
>> >      supported by SSE counterpart or the SSE prefetch is not available
>> >      (K6 machines).  Otherwise use SSE prefetch as it allows specifying
>> >      of locality.  */
>> >   if (TARGET_PREFETCHWT1 && write && locality <= 2)
>> >     operands[2] = const2_rtx;
>> >   else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
>> >     operands[2] = GEN_INT (3);
>> >   else
>> >     operands[1] = const0_rtx;
>> > })
>> > (Snip)
>> >
>> > Write prefetch may be requested (either by auto prefetcher or builtins) but
>> on -march=native, the below check could have become false.
>> >    else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
>> > TARGET_PRFCHW is off on native.
>> >
>> > So there are two issues here.
>> >
>> > (1) ISA flags enabled with -march=k8 is different from -march=native on k8
>> machine.
>
> I think  we need to file bug for this.  Need to check with Uros why the flag -mprfchw is shared with 3dnow.
> To work around this issue you can use -mno-prfchw when building with -march=k8.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77270

>> > (2) Need to check why GCC middle end requested write prefetch for the
>> test case with -march=k8 .
> On "prefetchw" generation it may be the case that GCC auto prefetcher requests write prefetches.
> AFAIK generating write prefetches brings data from memory and marks the catch line modified and expects a write to happen next.
> If read happens to that cache line instead  then data will be written back to memory before read which will be unnecessary.
> Hard to answer without test case and I don’t have a ready k8 machine with me.

Should this be another bug filed if I can get a reduced test case, or
is PR77270 enough, or is this not a bug?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]