I checked what options are being chosen on one of my laptops following the following instructions: http://en.chys.info/2010/04/what-exactly-marchnative-means/ But, when reviewing used options I got: $ ps af | grep cc1 18118 pts/1 S+ 0:00 \_ grep --colour=auto cc1 18116 pts/0 S+ 0:00 \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.4.3/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=prescott --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/ccLS5xw5.s 13580 tty3 S+ 0:00 \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.4.3/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=prescott --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/ccSnTxP2.s My /proc/cpuinfo is the following: $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Genuine Intel(R) CPU T2300 @ 1.66GHz stepping : 8 cpu MHz : 996.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm bogomips : 3324.55 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Genuine Intel(R) CPU T2300 @ 1.66GHz stepping : 8 cpu MHz : 996.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm bogomips : 3324.56 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: And, then, I see two problems: 1. -mtune=generic is being passed instead of, for example, -mtune="specific option" As I can read in "man gcc", looks like code is really being compiled for a generic set of CPUs instead of specific one: generic Produce code optimized for the most common IA32/AMD64/EM64T processors. If you know the CPU on which your code will run, then you should use the corresponding -mtune option instead of -mtune=generic. But, if you do not know exactly what CPU users of your application will have, then you should use this option. As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, the code generated option will change to reflect the processors that were most common when that version of GCC was released. There is no -march=generic option because -march indicates the instruction set the compiler can use, and there is no generic instruction set applicable to all processors. In contrast, -mtune indicates the processor (or, in this case, collection of processors) for which the code is optimized. 2. -march=prescott I am unsure about my processor is really a prescott one, even supporting sse3 it's listed as a Pentium-M based processor in the following links: http://en.wikipedia.org/wiki/List_of_Intel_microprocessors#Intel_Core http://en.wikipedia.org/wiki/Yonah_(microprocessor) http://en.wikipedia.org/wiki/List_of_Intel_Core_microprocessors#Core_Duo Then, I would pass "-march=pentium-m -msse3" instead. Thanks a lot for your help
Please try gcc 4.5.2 and report what it does.
gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about installing it :-/, are you sure this bug could be solved in 4.5* ?
(In reply to comment #2) > gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about > installing it :-/, are you sure this bug could be solved in 4.5* ? > 1. -march=native is changed in gcc 4.5. 2. Your cpu is Core. 3. -mtune=generic generates the fastest code for Core. 4. Prescott and Core have the same instruction set.
(In reply to comment #3) > (In reply to comment #2) > > gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about > > installing it :-/, are you sure this bug could be solved in 4.5* ? > > > > 1. -march=native is changed in gcc 4.5. Will try then with 4.5.1 (4.5.2 is still not available on Gentoo) > 2. Your cpu is Core. > 3. -mtune=generic generates the fastest code for Core. > 4. Prescott and Core have the same instruction set. Is -mtune=generic better than -mtune=prescott? "man gcc" looks to suggest last one would be better: If you know the CPU on which your code will run, then you should use the corresponding -mtune option instead of -mtune=generic.
(In reply to comment #4) > (In reply to comment #3) > > (In reply to comment #2) > > > gcc-4.5 is still hardmasked downstream in Gentoo, then, I am unsure about > > > installing it :-/, are you sure this bug could be solved in 4.5* ? > > > > > > > 1. -march=native is changed in gcc 4.5. > > Will try then with 4.5.1 (4.5.2 is still not available on Gentoo) > > > 2. Your cpu is Core. > > 3. -mtune=generic generates the fastest code for Core. > > 4. Prescott and Core have the same instruction set. > > Is -mtune=generic better than -mtune=prescott? "man gcc" looks to suggest last > one would be better: > If you know the CPU on which your code will run, then you should use the > corresponding -mtune option instead of -mtune=generic. > > -mtune=generic is the best tuning option for Intel processors, regardless whatever gcc manual says.
This is what I get with gcc-4.5.1: root 651 0.0 0.1 13080 1780 tty1 S+ 19:18 0:00 usr/libexec/gcc/i686-pc-linux-gnu/4.5.1/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=pentium-m --param l1-cache-size=32 --param l1-cache-line-size=64 - --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - - -- -auxbase-strip /dev/null -o /tmp/cc3udN3F.s Then, looks like: 1. It's still using -mtune=generic 2. It now uses "-march=pentium-m" instead of "presscott" :-/ 3. It doesn't seem to detect "sse3" Thanks for your help :-)
A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00469.html
It's still using generic in gcc-4.5, -march has moved from prescott to pentium-m: gcc-4.4: \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.4.5/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=prescott --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/ccpMToQG.s gcc-4.5.3: \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.5.3/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=pentium-m --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/cc11MaKg.s Should I try with gcc-4.6 to see if it uses something different than "generic" for mtune (or march has changed again)?
(In reply to comment #8) > Should I try with gcc-4.6 to see if it uses something different than "generic" > for mtune (or march has changed again)? Yes.
New output: \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.6.1/cc1 -quiet - -D_FORTIFY_SOURCE=2 -march=pentium-m -mno-cx16 -mno-sahf -mno-movbe -mno-aes -mno-pclmul -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm -mno-avx -mno-sse4.2 -mno-sse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -auxbase-strip /dev/null -o /tmp/cc63ioaE.s
If I use "mtune=native" instead of "march", final gcc command is different: /usr/libexec/gcc/i686-pc-linux-gnu/4.6.1/cc1 -quiet - -D_FORTIFY_SOURCE=2 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet -dumpbase - -march=i686 -auxbase-strip /dev/null -o /tmp/ccs2Uhin.s Not sure if that is normal
(In reply to comment #10) > New output: > > \_ /usr/libexec/gcc/i686-pc-linux-gnu/4.6.1/cc1 -quiet - -D_FORTIFY_SOURCE=2 > -march=pentium-m -mno-cx16 -mno-sahf -mno-movbe -mno-aes -mno-pclmul > -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm > -mno-avx -mno-sse4.2 -mno-sse4.1 --param l1-cache-size=32 --param > l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=generic -quiet > -dumpbase - -auxbase-strip /dev/null -o /tmp/cc63ioaE.s It looks great.
OK, I guess I should close this one as looks like "generic" is the best option for my processor