Bug 45483 - gcc-4.4.3 and 4.5.3: probably wrong optimization options chosen by "-march=native"
Summary: gcc-4.4.3 and 4.5.3: probably wrong optimization options chosen by "-march=na...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-01 15:04 UTC by Pacho Ramos
Modified: 2011-09-24 15:40 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-09-06 15:08:27


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pacho Ramos 2010-09-01 15:04:31 UTC
I checked what options are being chosen on one of my laptops following the
following instructions:
http://en.chys.info/2010/04/what-exactly-marchnative-means/

But,  when reviewing used options I got:

$ ps af | grep cc1
18118 pts/1    S+     0:00  \_ grep --colour=auto cc1
18116 pts/0    S+     0:00      \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.4.3/cc1
-quiet - -D_FORTIFY_SOURCE=2 -march=prescott --param l1-cache-size=32 --param
l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet
-dumpbase - -auxbase-strip /dev/null -o /tmp/ccLS5xw5.s
13580 tty3     S+     0:00          \_
/usr/libexec/gcc/i686-pc-linux-gnu/4.4.3/cc1 -quiet - -D_FORTIFY_SOURCE=2
-march=prescott --param l1-cache-size=32 --param l1-cache-line-size=64 --param
l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null
-o /tmp/ccSnTxP2.s

My /proc/cpuinfo is the following:

$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Genuine Intel(R) CPU           T2300  @ 1.66GHz
stepping        : 8
cpu MHz         : 996.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon
bts aperfmperf pni monitor vmx est tm2 xtpr pdcm
bogomips        : 3324.55
clflush size    : 64
cache_alignment : 64
address sizes   : 32 bits physical, 32 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Genuine Intel(R) CPU           T2300  @ 1.66GHz
stepping        : 8
cpu MHz         : 996.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon
bts aperfmperf pni monitor vmx est tm2 xtpr pdcm
bogomips        : 3324.56
clflush size    : 64
cache_alignment : 64
address sizes   : 32 bits physical, 32 bits virtual
power management:

And, then, I see two problems:
1. -mtune=generic is being passed instead of, for example, -mtune="specific
option"

As I can read in "man gcc", looks like code is really being compiled for a
generic set of CPUs instead of specific one:

           generic
               Produce code optimized for the most common IA32/AMD64/EM64T
processors.  If you know the CPU on which your code will run, then
               you should use the corresponding -mtune option instead of
-mtune=generic.  But, if you do not know exactly what CPU users of
               your application will have, then you should use this option.

               As new processors are deployed in the marketplace, the behavior
of this option will change.  Therefore, if you upgrade to a
               newer version of GCC, the code generated option will change to
reflect the processors that were most common when that version
               of GCC was released.

               There is no -march=generic option because -march indicates the
instruction set the compiler can use, and there is no generic
               instruction set applicable to all processors.  In contrast,
-mtune indicates the processor (or, in this case, collection of
               processors) for which the code is optimized.

2. -march=prescott 

I am unsure about my processor is really a prescott one, even supporting sse3
it's listed as a Pentium-M based processor in the following links:
http://en.wikipedia.org/wiki/List_of_Intel_microprocessors#Intel_Core
http://en.wikipedia.org/wiki/Yonah_(microprocessor)
http://en.wikipedia.org/wiki/List_of_Intel_Core_microprocessors#Core_Duo

Then, I would pass "-march=pentium-m -msse3" instead.

Thanks a lot for your help
Comment 1 H.J. Lu 2010-09-01 15:40:22 UTC
Please try gcc 4.5.2 and report what it does.
Comment 2 Pacho Ramos 2010-09-01 15:51:41 UTC
gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about installing it :-/, are you sure this bug could be solved in 4.5* ?
Comment 3 H.J. Lu 2010-09-01 15:56:14 UTC
(In reply to comment #2)
> gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about
> installing it :-/, are you sure this bug could be solved in 4.5* ?
> 

1. -march=native is changed in gcc 4.5.
2. Your cpu is Core.
3. -mtune=generic generates the fastest code for Core.
4. Prescott and Core have the same instruction set.
Comment 4 Pacho Ramos 2010-09-01 16:06:22 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about
> > installing it :-/, are you sure this bug could be solved in 4.5* ?
> > 
> 
> 1. -march=native is changed in gcc 4.5.

Will try then with 4.5.1 (4.5.2 is still not available on Gentoo)

> 2. Your cpu is Core.
> 3. -mtune=generic generates the fastest code for Core.
> 4. Prescott and Core have the same instruction set.

Is -mtune=generic better than -mtune=prescott? "man gcc" looks to suggest last one would be better:
If you know the CPU on which your code will run, then you should use the corresponding -mtune option instead of -mtune=generic.
 

Comment 5 H.J. Lu 2010-09-01 16:37:14 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about
> > > installing it :-/, are you sure this bug could be solved in 4.5* ?
> > > 
> > 
> > 1. -march=native is changed in gcc 4.5.
> 
> Will try then with 4.5.1 (4.5.2 is still not available on Gentoo)
> 
> > 2. Your cpu is Core.
> > 3. -mtune=generic generates the fastest code for Core.
> > 4. Prescott and Core have the same instruction set.
> 
> Is -mtune=generic better than -mtune=prescott? "man gcc" looks to suggest last
> one would be better:
> If you know the CPU on which your code will run, then you should use the
> corresponding -mtune option instead of -mtune=generic.
> 
> 

-mtune=generic is the best tuning option for Intel processors,
regardless whatever gcc manual says.
Comment 6 Pacho Ramos 2010-09-06 10:45:05 UTC
This is what I get with gcc-4.5.1:
root       651  0.0  0.1  13080  1780 tty1     S+   19:18   0:00
usr/libexec/gcc/i686-pc-linux-gnu/4.5.1/cc1 -quiet - -D_FORTIFY_SOURCE=2
 -march=pentium-m --param l1-cache-size=32 --param l1-cache-line-size=64
 - --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase -
 - -- -auxbase-strip /dev/null -o /tmp/cc3udN3F.s

Then, looks like:
1. It's still using -mtune=generic
2. It now uses "-march=pentium-m" instead of "presscott" :-/
3. It doesn't seem to detect "sse3" 

Thanks for your help :-)
Comment 7 H.J. Lu 2010-09-06 15:08:27 UTC
A patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00469.html
Comment 8 Pacho Ramos 2011-09-22 20:47:04 UTC
It's still using generic in gcc-4.5, -march has moved from prescott to pentium-m:
gcc-4.4: \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.4.5/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=prescott --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/ccpMToQG.s


gcc-4.5.3: \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.5.3/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=pentium-m --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/cc11MaKg.s


Should I try with gcc-4.6 to see if it uses something different than "generic" for mtune (or march has changed again)?
Comment 9 H.J. Lu 2011-09-22 21:08:09 UTC
(In reply to comment #8)
> Should I try with gcc-4.6 to see if it uses something different than "generic"
> for mtune (or march has changed again)?

Yes.
Comment 10 Pacho Ramos 2011-09-22 22:33:20 UTC
New output:

\_ /usr/libexec/gcc/i686-pc-linux-gnu/4.6.1/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=pentium-m -mno-cx16 -mno-sahf -mno-movbe -mno-aes -mno-pclmul -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm -mno-avx -mno-sse4.2 -mno-sse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/cc63ioaE.s
Comment 11 Pacho Ramos 2011-09-22 22:39:58 UTC
If I use "mtune=native" instead of "march", final gcc command is different:

/usr/libexec/gcc/i686-pc-linux-gnu/4.6.1/cc1 -quiet - -D_FORTIFY_SOURCE=2 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -march=i686 -auxbase-strip /dev/null -o /tmp/ccs2Uhin.s


Not sure if that is normal
Comment 12 H.J. Lu 2011-09-24 15:19:56 UTC
(In reply to comment #10)
> New output:
> 
> \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.6.1/cc1 -quiet - -D_FORTIFY_SOURCE=2
> -march=pentium-m -mno-cx16 -mno-sahf -mno-movbe -mno-aes -mno-pclmul
> -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm
> -mno-avx -mno-sse4.2 -mno-sse4.1 --param l1-cache-size=32 --param
> l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet
> -dumpbase - -auxbase-strip /dev/null -o /tmp/cc63ioaE.s

It looks great.
Comment 13 Pacho Ramos 2011-09-24 15:40:17 UTC
OK, I guess I should close this one as looks like "generic" is the best option for my processor