I noticed that when compiling with -march=geode the code is actually slower than the one generated with generic options (i386 or i686). This happens with any program, not just with dhrystone # CFLAGS='-march=i386' ./dry.c Dhrystones per Second: 643501 # CFLAGS='-march=i486' ./dry.c Dhrystones per Second: 712251 # CFLAGS='-march=i586' ./dry.c Dhrystones per Second: 711238 # CFLAGS='-march=i686' ./dry.c Dhrystones per Second: 878735 # CFLAGS='-march=geode' ./dry.c Dhrystones per Second: 684932
Created attachment 18994 [details] the dhrystone benchmark
On which Geode is this on? Is this with the first generation Geode or the one AMD made? Because if it is the latter, it is not really a Geode but a k6/k7 based processor.
Try -march=native instead.
# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 10 model name : Geode(TM) Integrated Processor by AMD PCS stepping : 2 cpu MHz : 498.060 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext 3dnowext 3dnow bogomips : 996.12 clflush size : 32 cache_alignment : 32 address sizes : 32 bits physical, 32 bits virtual power management: # CFLAGS='-march=native' ./dry.c Dhrystones per Second: 715308
Subject: Re: Code optimized for AMD Geode is slower than generic Sent from my iPhone On Nov 8, 2009, at 11:52 AM, "rootkit85 at yahoo dot it" <gcc-bugzilla@gcc.gnu.org > wrote: > > > ------- Comment #4 from rootkit85 at yahoo dot it 2009-11-08 19:52 > ------- > # cat /proc/cpuinfo > processor : 0 > vendor_id : AuthenticAMD > cpu family : 5 > model : 10 > model name : Geode(TM) Integrated Processor by AMD PCS > stepping : 2 > cpu MHz : 498.060 > cache size : 128 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 1 > wp : yes > flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx > mmxext > 3dnowext 3dnow Since it has cmov this is not a real geode :). > bogomips : 996.12 > clflush size : 32 > cache_alignment : 32 > address sizes : 32 bits physical, 32 bits virtual > power management: > > # CFLAGS='-march=native' ./dry.c > Dhrystones per Second: 715308 > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41989 >
(In reply to comment #5) > Subject: Re: Code optimized for AMD Geode is slower than generic > > > > Sent from my iPhone > > On Nov 8, 2009, at 11:52 AM, "rootkit85 at yahoo dot it" > <gcc-bugzilla@gcc.gnu.org > > wrote: > > > > > > > ------- Comment #4 from rootkit85 at yahoo dot it 2009-11-08 19:52 > > ------- > > # cat /proc/cpuinfo > > processor : 0 > > vendor_id : AuthenticAMD > > cpu family : 5 > > model : 10 > > model name : Geode(TM) Integrated Processor by AMD PCS > > stepping : 2 > > cpu MHz : 498.060 > > cache size : 128 KB > > fdiv_bug : no > > hlt_bug : no > > f00f_bug : no > > coma_bug : no > > fpu : yes > > fpu_exception : yes > > cpuid level : 1 > > wp : yes > > flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx > > mmxext > > 3dnowext 3dnow > > Since it has cmov this is not a real geode :). What has to do CMOV with Geode? It isn't an Athlon: # CFLAGS='-march=k6' ./dry.c Dhrystones per Second: 757576 # CFLAGS='-march=athlon' ./dry.c Dhrystones per Second: 764526 # CFLAGS='-march=athlon-xp' ./dry.c Dhrystones per Second: 758725
-march=geode disables cmov because the real geode does not have cmov :). This is why it is much slower.
1) define "real geode" 2) what CPU do I have?
Read here pag. 15: http://www.amd.com/files/connectivitysolutions/geode/geode_lx/33234F_LX_databook.pdf "The instruction set supported by the core is a combination of Intel Pentium® processor, AMD Athlon™ processor, and AMD Geode LX processor specific instructions. Specifi- cally, it supports the Pentium, Pentium Pro, AMD 3DNow!™ technology and MMX™ instructions for the AMD Athlon processor. It supports a subset of the specialized AMD Geode LX processor instructions including special SMM instructions. The CPU Core does not support the entire Katmai New Instruction (KNI) set as implemented in the Pentium 3. It does support the MMX instructions for the AMD Athlon processor, which are a subset of the Pentium 3 KNI instructions."
(In reply to comment #8) > 1) define "real geode" > 2) what CPU do I have? http://en.wikipedia.org/wiki/Geode_%28processor%29#AMD_Geode
OK, according to your benchmarks and documentation qoute, it looks we have to split "geode-lx" out of generic "geode" option. So, can you confirm, that the difference between generic geode and geode-lx is presence of CMOV bit? According to your documentation quote, it should use i686 pipeline with 3dNow! features. There is also geode NX, IIRC it represents itself as Athlon, but someone should confirm this. See also PR37179, geode LX does not implrement ffreep, so it is not Athlon.
Reopened to clear this geode mess.
Created attachment 18997 [details] Patch that introduces geode-lx CPU option Can you patch the compiler with attached patch? "gcc -march=native -### hello.c" should return -march=geode-lx somewhere for your target. Passing "-march=geode-lx" should generate fastest code.
(In reply to comment #11) > There is also geode NX, IIRC it represents itself as Athlon, but someone should > confirm this. According to [1], cpuid for Geode NX returns "AMD Geode NX 1750", and this string doesn't trigger our current detection logic for Geode architecture. [1] http://en.gentoo-wiki.com/wiki/Safe_Cflags/AMD#Athlon_XP.2FGeode_NX
(In reply to comment #7) > -march=geode disables cmov because the real geode does not have cmov :). No, all geodes have cmov.
Yes, it seems that even old Geode has such instructions: # cat /proc/cpuinfo processor : 0 vendor_id : Geode by NSC cpu family : 5 model : 9 model name : Unknown stepping : 1 cpu MHz : 266.688 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu tsc msr cx8 cmov mmx cxmmx bogomips : 534.07
(In reply to comment #16) > Yes, it seems that even old Geode has such instructions: So, I guess they should be listed under <config X86_CMOV> in linux-2.6/arch/x86/Kconfig.cpu.
As I did here? http://patchwork.kernel.org/patch/51410/
(In reply to comment #18) > As I did here? > > http://patchwork.kernel.org/patch/51410/ Yes, but I don't know if perhaps -march=k6-2 should be used as a fallback, as suggested in [1]. BTW: You can change optimization bitmasks and cost tables in gcc/config/i386.c to find out what effects runtime performace, but without access to a target processor, there is no way to resolve runtime regressions. [1] http://en.gentoo-wiki.com/wiki/Safe_Cflags/AMD#Geode_LX
Yes K6 is the best fallback for geode-lx, while pentium-mmx is the best one for geode. I need to know if this new -march argument will be added so I edit the kernel patch.
Created attachment 19001 [details] A patch which adds Geode LX support to GCC
(In reply to comment #20) > Yes K6 is the best fallback for geode-lx, while pentium-mmx is the best one for > geode. BTW, recommended fallback is K6-2. > I need to know if this new -march argument will be added so I edit the kernel > patch. Actually, according to gcc documentation, "-march=geode" is intended specifically for geode-lx and switches on features and tuning options for geode-lx. If you want to play with tuning options, you can check differences between m_PPRO and m_GEODE in i386.c, ix86_tune_features. By changing these settings, you can narrow down which flag causes a regression. I would suggest to start with flags that mention partial register stalls. OTOH, you can also play with "-march=geode, -mtune=i686" to switch on various tuning flags.
Despite its name Geode GX, LX and NX are very different, I guess that we should split them to geode-gx and geode-lx, and alias geode-nx to k7
What's the situation right now, is there any progress? I have Geode LX 800MHz, and the code compiled for generic x86 architectures runs really slow . I'm thinking of producing optimized packages for Geode LX, however I'm confused as to which CFLAGS I should use. Is there any clarifications for this? I'm using gcc 4.3.5, btw.
try -march=i686 it should be the best
(In reply to comment #25) > try -march=i686 it should be the best > What about the fact that Geode LX does not have a NOPL instruction, while i686 does. Couldn't that result in binaries that crash? --Andrew
you could try but i'm not sure that NOPL is mandatory for the i686 arch
Did this issue gets resolved in the newer version of AMD Geode? https://paradipport.gov.in/en/hotmail-login/