Bug 45359

Summary: poor -march=native choices for VIA C7 Esther processors
Product: gcc Reporter: JM <opod>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: minor CC: gcc-bugs, sezeroz
Priority: P3    
Version: 4.5.1   
Target Milestone: 4.7.4   
URL: http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00941.html
Host: Target: i686-pc-linux-gnu
Build: Known to work:
Known to fail: Last reconfirmed:
Attachments: native VIA/CentaurHauls
centaur.patch
centaur2.patch

Description JM 2010-08-20 15:58:31 UTC
C7 is a x86 CPU from VIA based on Esther (C5J) core evolved from the Nehemiah+ (C5P) core found in the C3-2 line.

/proc/cpuinfo
processor	: 0
vendor_id	: CentaurHauls
cpu family	: 6
model		: 10
model name	: VIA Esther processor  800MHz
stepping	: 9
cpu MHz		: 399.008
cache size	: 128 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm nx up pni est tm2 rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en
bogomips	: 798.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 32 bits virtual

gcc -v -fverbose-asm -march=native -S test.c
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-pc-linux-gnu/4.5.1/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: ../configure --prefix=/usr --enable-languages=c,c++,fortran,objc,obj-c++,ada --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-gnu-unique-object --enable-lto --enable-plugin --disable-multilib --disable-libstdcxx-pch --with-system-zlib --with-ppl --with-cloog --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info
Thread model: posix
gcc version 4.5.1 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-fverbose-asm'  '-S'
 /usr/lib/gcc/i686-pc-linux-gnu/4.5.1/cc1 -quiet -v test.c -march=pentium-m -mtune=generic -quiet -dumpbase test.c -auxbase test -version -fverbose-asm -o test.s
[snip]

cat test.s
# GNU C (GCC) version 4.5.1 (i686-pc-linux-gnu)
#	compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version 3.0.0-p3, MPC version 0.8.2
# GGC heuristics: --param ggc-min-expand=63 --param ggc-min-heapsize=62811
# options passed:  test.c -march=pentium-m -mtune=generic -fverbose-asm
# options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-asm
# -fearly-inlining -feliminate-unused-debug-types -ffunction-cse -fgcse-lm
# -fident -finline-functions-called-once -fira-share-save-slots
# -fira-share-spill-slots -fivopts -fkeep-static-consts
# -fleading-underscore -fmath-errno -fmerge-debug-strings
# -fmove-loop-invariants -fpcc-struct-return -fpeephole
# -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fshow-column -fsigned-zeros
# -fsplit-ivs-in-unroller -ftrapping-math -ftree-cselim -ftree-forwprop
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
# -ftree-scev-cprop -ftree-slp-vectorize -ftree-vect-loop-version
# -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double
# -maccumulate-outgoing-args -malign-stringops -mfancy-math-387
# -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-red-zone
# -mno-sse4 -mpush-args -msahf -msse -msse2 -mtls-direct-seg-refs
Comment 1 Richard Biener 2010-08-23 20:17:21 UTC
Why do you think it's a poor choice?
Comment 2 JM 2010-08-24 18:58:57 UTC
(In reply to comment #1)
The processor clearly supports SSE3 so perhaps -march=prescott would be better instead of -march=pentium-m. I also assumed that -march=pentium-m implies -mfpmath=387 but it does not seem to apply (or matter). Finally, -march=native on my laptop picks up L1 and L2 cache sizes as --params which does not happen for the VIA C7. Just for reference, it is reported as 

Cache info
 L1 Instruction cache: 64KB, 4-way associative, 1 lines per tag, line size=64 bytes.
 L1 Data cache: 64KB 4-way associative, 1 lines per tag, line size=64 bytes.
 L2 (on CPU) cache: 128KB 10-way associative, 1 lines per tag, line size=64 bytes.

HTH
Comment 3 Dzianis Kahanovich 2010-11-01 13:21:40 UTC
Created attachment 22220 [details]
native VIA/CentaurHauls

(In reply to comment #1)
> Why do you think it's a poor choice?

This is regression after PR target/44046. Previous behaviour was:
sse3 -> "-march=prescott -mtune=generic", now: sse3 -> "-march=pentium-m -mtune=generic". This regression lose sse3 support for C-7 CPU. But in other point, "prescott" as family-15 member, may be (or not) else scheduled, then family-6 clone... Other source of problem - VIA/Centaur CPUs detecting as "Intel" vendor. I believe, Intel support have own reason to make choice sse3 -> pentium-m and lose this sse3, then I suggest to forget this behaviour and add native VIA/CentaurHauls support code. There are 3 point of detection:
1) vendor signature;
2) cache detection: according to Linux kernel code,  "detect_caches_amd" behaviour is not vendor-specific and used in kernel also for any x86_64, VIA/Centaur, Transmeta and Cyrix family-5/model-5, but I have no exotic CPUs exclude VIA C-7 in my notebook to test other vendors;
3) model detection - C-7 will be "-march=prescott -mtune=core2" (FIXME if pure "prescott" is better!), also may be fixed c3-2 -mtune=c3 selection (Gentoo Wiki suggest -mtune=generic or -march=c3 to avoid NOPL for some models, but I try to not use variable "generic").
Comment 4 Dzianis Kahanovich 2010-11-07 10:47:13 UTC
Created attachment 22306 [details]
centaur.patch

Just cleanup (c3-2). "-mtune" not passed to assembler, then "-mtune" vs. NOPL (by Gentoo Wiki) is not useful. May be "-Wa,-mtune=generic" or "-Wa,", but not here...
Comment 5 Dzianis Kahanovich 2010-11-18 13:05:51 UTC
Created attachment 22445 [details]
centaur2.patch

Compatibility with GNU assembler -mtune set (#40171).
Comment 6 linuxball 2013-04-25 19:07:45 UTC
I reconfirm this bug using gcc 4.6.3-1ubuntu5 in Ubuntu 12.04 to compile stuff optimized for VIA C7-D Esther processor. Still the same issue:

Using -march=native 

1) still chooses -march=pentium-m -mtune=generic (ignoring the sse3 capability)
2) will not detect the L1 and L2 cache parameters

IMHO, the centaur2.patch suggested by Dzianis Kahanovich is a good fix. Why hasn't it found its way into the official gcc?

Best regards and thanks to Dzianis for his contribution

linuxball
Comment 7 Uroš Bizjak 2013-05-17 18:00:03 UTC
Implemented for 4.7.4, 4.8.1 and mainline by:

Author: uros
Date: Thu May 16 19:53:36 2013
New Revision: 198987

URL: http://gcc.gnu.org/viewcvs?rev=198987&root=gcc&view=rev
Log:
	PR target/45359
	PR target/46396
	* config/i386/driver-i386.c (host_detect_local_cpu): Detect
	VIA/Centaur processors and determine their cache parameters
	using detect_caches_amd.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/driver-i386.c

Author: uros
Date: Thu May 16 21:41:26 2013
New Revision: 198989

URL: http://gcc.gnu.org/viewcvs?rev=198989&root=gcc&view=rev
Log:
	* config/i386/driver-i386.c (host_detect_local_cpu): Determine
	cache parameters using detect_caches_amd also for CYRIX,
	NSC and TM2 signatures.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/driver-i386.c

Author: uros
Date: Fri May 17 15:06:36 2013
New Revision: 199017

URL: http://gcc.gnu.org/viewcvs?rev=199017&root=gcc&view=rev
Log:
	Backport from mainline
	2013-05-16  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/driver-i386.c (host_detect_local_cpu): Determine
	cache parameters using detect_caches_amd also for CYRIX,
	NSC and TM2 signatures.

	2013-05-16  Uros Bizjak  <ubizjak@gmail.com>
		    Dzianis Kahanovich  <mahatma@eu.by>

	PR target/45359
	PR target/46396
	* config/i386/driver-i386.c (host_detect_local_cpu): Detect
	VIA/Centaur processors and determine their cache parameters
	using detect_caches_amd.

	2013-05-15  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.c (ix86_option_override_internal): Update
	processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags.  Add
	PTA_POPCNT to corei7 entry. Do not enable SSE prefetch on
	non-SSE 3dNow! targets.  Enable TARGET_PRFCHW for TARGET_3DNOW targets.
	* config/i386/i386.md (prefetch): Enable for TARGET_PRFCHW instead
	of TARGET_3DNOW.
	(*prefetch_3dnow): Enable for TARGET_PRFCHW only.


Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/config/i386/driver-i386.c
    branches/gcc-4_8-branch/gcc/config/i386/i386.c
    branches/gcc-4_8-branch/gcc/config/i386/i386.md

Author: uros
Date: Fri May 17 17:50:11 2013
New Revision: 199026

URL: http://gcc.gnu.org/viewcvs?rev=199026&root=gcc&view=rev
Log:
	Backport from mainline
	2013-05-16  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/driver-i386.c (host_detect_local_cpu): Determine
	cache parameters using detect_caches_amd also for CYRIX,
	NSC and TM2 signatures.

	2013-05-16  Uros Bizjak  <ubizjak@gmail.com>
		    Dzianis Kahanovich  <mahatma@eu.by>

	PR target/45359
	PR target/46396
	* config/i386/driver-i386.c (host_detect_local_cpu): Detect
	VIA/Centaur processors and determine their cache parameters
	using detect_caches_amd.

	2013-05-15  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.c (ix86_option_override_internal): Add
	PTA_POPCNT to corei7 entry.


Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/config/i386/driver-i386.c
    branches/gcc-4_7-branch/gcc/config/i386/i386.c