Bug 41989 - Code optimized for AMD Geode is slower than generic
Summary: Code optimized for AMD Geode is slower than generic
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 37179
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-08 18:51 UTC by Matteo Croce
Modified: 2023-01-09 20:04 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-11-09 10:28:03


Attachments
the dhrystone benchmark (9.65 KB, text/plain)
2009-11-08 18:51 UTC, Matteo Croce
Details
Patch that introduces geode-lx CPU option (576 bytes, patch)
2009-11-09 10:34 UTC, Uroš Bizjak
Details | Diff
A patch which adds Geode LX support to GCC (570 bytes, patch)
2009-11-09 21:59 UTC, Matteo Croce
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matteo Croce 2009-11-08 18:51:06 UTC
I noticed that when compiling with -march=geode the code is actually slower
than the one generated with generic options (i386 or i686).
This happens with any program, not just with dhrystone

# CFLAGS='-march=i386' ./dry.c
Dhrystones per Second:                          643501 

# CFLAGS='-march=i486' ./dry.c
Dhrystones per Second:                          712251 

# CFLAGS='-march=i586' ./dry.c
Dhrystones per Second:                          711238

# CFLAGS='-march=i686' ./dry.c
Dhrystones per Second:                          878735

# CFLAGS='-march=geode' ./dry.c
Dhrystones per Second:                          684932
Comment 1 Matteo Croce 2009-11-08 18:51:50 UTC
Created attachment 18994 [details]
the dhrystone benchmark
Comment 2 Andrew Pinski 2009-11-08 18:55:45 UTC
On which Geode is this on?  Is this with the first generation Geode or the one AMD made?  Because if it is the latter, it is not really a Geode but a k6/k7 based processor.  
Comment 3 Richard Biener 2009-11-08 19:00:35 UTC
Try -march=native instead.
Comment 4 Matteo Croce 2009-11-08 19:52:48 UTC
# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 5
model           : 10
model name      : Geode(TM) Integrated Processor by AMD PCS
stepping        : 2
cpu MHz         : 498.060
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext 3dnowext 3dnow
bogomips        : 996.12
clflush size    : 32
cache_alignment : 32
address sizes   : 32 bits physical, 32 bits virtual
power management:

# CFLAGS='-march=native' ./dry.c
Dhrystones per Second:                          715308
Comment 5 pinskia@gmail.com 2009-11-08 19:57:38 UTC
Subject: Re:  Code optimized for AMD Geode is slower than generic



Sent from my iPhone

On Nov 8, 2009, at 11:52 AM, "rootkit85 at yahoo dot it" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #4 from rootkit85 at yahoo dot it  2009-11-08 19:52  
> -------
> # cat /proc/cpuinfo
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 5
> model           : 10
> model name      : Geode(TM) Integrated Processor by AMD PCS
> stepping        : 2
> cpu MHz         : 498.060
> cache size      : 128 KB
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu de pse tsc msr cx8 sep pge cmov clflush mmx  
> mmxext
> 3dnowext 3dnow

Since it has cmov this is not a real geode :).


> bogomips        : 996.12
> clflush size    : 32
> cache_alignment : 32
> address sizes   : 32 bits physical, 32 bits virtual
> power management:
>
> # CFLAGS='-march=native' ./dry.c
> Dhrystones per Second:                          715308
>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41989
>
Comment 6 Matteo Croce 2009-11-08 22:34:27 UTC
(In reply to comment #5)
> Subject: Re:  Code optimized for AMD Geode is slower than generic
> 
> 
> 
> Sent from my iPhone
> 
> On Nov 8, 2009, at 11:52 AM, "rootkit85 at yahoo dot it"
> <gcc-bugzilla@gcc.gnu.org 
>  > wrote:
> 
> >
> >
> > ------- Comment #4 from rootkit85 at yahoo dot it  2009-11-08 19:52  
> > -------
> > # cat /proc/cpuinfo
> > processor       : 0
> > vendor_id       : AuthenticAMD
> > cpu family      : 5
> > model           : 10
> > model name      : Geode(TM) Integrated Processor by AMD PCS
> > stepping        : 2
> > cpu MHz         : 498.060
> > cache size      : 128 KB
> > fdiv_bug        : no
> > hlt_bug         : no
> > f00f_bug        : no
> > coma_bug        : no
> > fpu             : yes
> > fpu_exception   : yes
> > cpuid level     : 1
> > wp              : yes
> > flags           : fpu de pse tsc msr cx8 sep pge cmov clflush mmx  
> > mmxext
> > 3dnowext 3dnow
> 
> Since it has cmov this is not a real geode :).

What has to do CMOV with Geode? It isn't an Athlon:

# CFLAGS='-march=k6' ./dry.c
Dhrystones per Second:                          757576

# CFLAGS='-march=athlon' ./dry.c
Dhrystones per Second:                          764526

# CFLAGS='-march=athlon-xp' ./dry.c
Dhrystones per Second:                          758725
Comment 7 Andrew Pinski 2009-11-09 03:31:20 UTC
-march=geode disables cmov because the real geode does not have cmov :).

This is why it is much slower.
Comment 8 Matteo Croce 2009-11-09 08:55:26 UTC
1) define "real geode"
2) what CPU do I have?
Comment 9 Matteo Croce 2009-11-09 09:01:48 UTC
Read here pag. 15:
http://www.amd.com/files/connectivitysolutions/geode/geode_lx/33234F_LX_databook.pdf

"The instruction set supported by the core is a combination
of Intel Pentium® processor, AMD Athlon™ processor, and
AMD Geode LX processor specific instructions. Specifi-
cally, it supports the Pentium, Pentium Pro, AMD 3DNow!™
technology and MMX™ instructions for the AMD Athlon
processor. It supports a subset of the specialized
AMD Geode LX processor instructions including special
SMM instructions. The CPU Core does not support the
entire Katmai New Instruction (KNI) set as implemented in
the Pentium 3. It does support the MMX instructions for the
AMD Athlon processor, which are a subset of the
Pentium 3 KNI instructions."
Comment 10 Uroš Bizjak 2009-11-09 09:34:20 UTC
(In reply to comment #8)
> 1) define "real geode"
> 2) what CPU do I have?

http://en.wikipedia.org/wiki/Geode_%28processor%29#AMD_Geode
Comment 11 Uroš Bizjak 2009-11-09 10:28:03 UTC
OK, according to your benchmarks and documentation qoute, it looks we have to split "geode-lx" out of generic "geode" option.

So, can you confirm, that the difference between generic geode and geode-lx is presence of CMOV bit? According to your documentation quote, it should use i686 pipeline with 3dNow! features.

There is also geode NX, IIRC it represents itself as Athlon, but someone should confirm this.

See also PR37179, geode LX does not implrement ffreep, so it is not Athlon.
Comment 12 Uroš Bizjak 2009-11-09 10:28:37 UTC
Reopened to clear this geode mess.
Comment 13 Uroš Bizjak 2009-11-09 10:34:18 UTC
Created attachment 18997 [details]
Patch that introduces geode-lx CPU option

Can you patch the compiler with attached patch?

"gcc -march=native -### hello.c" should return -march=geode-lx somewhere for your target. Passing "-march=geode-lx" should generate fastest code.
Comment 14 Uroš Bizjak 2009-11-09 10:47:56 UTC
(In reply to comment #11)

> There is also geode NX, IIRC it represents itself as Athlon, but someone should
> confirm this.

According to [1], cpuid for Geode NX returns "AMD Geode NX 1750", and this string doesn't trigger our current detection logic for Geode architecture.

[1] http://en.gentoo-wiki.com/wiki/Safe_Cflags/AMD#Athlon_XP.2FGeode_NX
Comment 15 Uroš Bizjak 2009-11-09 11:48:32 UTC
(In reply to comment #7)
> -march=geode disables cmov because the real geode does not have cmov :).

No, all geodes have cmov.
Comment 16 Matteo Croce 2009-11-09 13:17:44 UTC
Yes, it seems that even old Geode has such instructions:

# cat /proc/cpuinfo 
processor       : 0
vendor_id       : Geode by NSC
cpu family      : 5
model           : 9
model name      : Unknown
stepping        : 1
cpu MHz         : 266.688
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu tsc msr cx8 cmov mmx cxmmx
bogomips        : 534.07
Comment 17 Uroš Bizjak 2009-11-09 15:10:46 UTC
(In reply to comment #16)
> Yes, it seems that even old Geode has such instructions:

So, I guess they should be listed under <config X86_CMOV> in linux-2.6/arch/x86/Kconfig.cpu.
Comment 18 Matteo Croce 2009-11-09 15:26:03 UTC
As I did here?

http://patchwork.kernel.org/patch/51410/
Comment 19 Uroš Bizjak 2009-11-09 18:16:11 UTC
(In reply to comment #18)
> As I did here?
> 
> http://patchwork.kernel.org/patch/51410/

Yes, but I don't know if perhaps -march=k6-2 should be used as a fallback, as suggested in [1].

BTW: You can change optimization bitmasks and cost tables in gcc/config/i386.c to find out what effects runtime performace, but without access to a target processor, there is no way to resolve runtime regressions.

[1] http://en.gentoo-wiki.com/wiki/Safe_Cflags/AMD#Geode_LX
Comment 20 Matteo Croce 2009-11-09 20:25:58 UTC
Yes K6 is the best fallback for geode-lx, while pentium-mmx is the best one for geode.
I need to know if this new -march argument will be added so I edit the kernel patch.
Comment 21 Matteo Croce 2009-11-09 21:59:37 UTC
Created attachment 19001 [details]
A patch which adds Geode LX support to GCC
Comment 22 Uroš Bizjak 2009-11-10 07:45:31 UTC
(In reply to comment #20)
> Yes K6 is the best fallback for geode-lx, while pentium-mmx is the best one for
> geode.

BTW, recommended fallback is K6-2.

> I need to know if this new -march argument will be added so I edit the kernel
> patch.

Actually, according to gcc documentation, "-march=geode" is intended specifically for geode-lx and switches on features and tuning options for geode-lx.

If you want to play with tuning options, you can check differences between m_PPRO and m_GEODE in i386.c, ix86_tune_features. By changing these settings, you can narrow down which flag causes a regression. I would suggest to start with flags that mention partial register stalls.

OTOH, you can also play with "-march=geode, -mtune=i686" to switch on various tuning flags.
Comment 23 Matteo Croce 2009-11-16 10:02:59 UTC
Despite its name Geode GX, LX and NX are very different, I guess that we should split them to geode-gx and geode-lx, and alias geode-nx to k7
Comment 24 Eren Türkay 2010-08-19 09:13:48 UTC
What's the situation right now, is there any progress?

I have Geode LX 800MHz, and the code compiled for generic x86 architectures runs really slow . I'm thinking of producing optimized packages for Geode LX, however I'm confused as to which CFLAGS I should use. Is there any clarifications for this?

I'm using gcc 4.3.5, btw.
Comment 25 Matteo Croce 2010-08-22 13:34:20 UTC
try -march=i686 it should be the best
Comment 26 Andrew Atrens 2010-08-31 17:14:45 UTC
(In reply to comment #25)
> try -march=i686 it should be the best
> 

What about the fact that Geode LX does not have a NOPL instruction, while i686 does. Couldn't that result in binaries that crash?

--Andrew
Comment 27 Matteo Croce 2010-08-31 18:02:27 UTC
you could try but i'm not sure that NOPL is mandatory for the i686 arch
Comment 28 Samantha Keen 2023-01-09 20:04:39 UTC Comment hidden (spam)