This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] tuning gcc for Intel Core2


H. J. Lu wrote:

On Thu, Nov 16, 2006 at 07:41:34AM -0800, Ian Lance Taylor wrote:


Vladimir Makarov <vmakarov@redhat.com> writes:



2006-11-13 Vladimir Makarov <vmakarov@redhat.com>

* doc/invoke.texi (core2): Add item.

	* config/i386/i386.h (TARGET_CORE2, TARGET_CPU_DEFAULT_core2): New
	macros.
	(TARGET_CPU_CPP_BUILTINS): Add code for core2.
	(TARGET_CPU_DEFAULT_generic): Change value.
	(TARGET_CPU_DEFAULT_NAMES): Add core2.
	(processor_type): Add new constant PROCESSOR_CORE2.

* config/i386/i386.md (cpu): Add core2.

* config/i386/i386.c (core2_cost): New initialized variable.
(m_CORE2): New macro.
(x86_movx, x86_unroll_strlen, x86_cmove, x86_deep_branch,
x86_use_sahf, x86_partial_reg_stall, x86_partial_flag_reg_stall,
x86_use_simode_fiop, x86_single_stringop, x86_himode_math,
x86_promote_hi_regs, x86_sub_esp_4, x86_sub_esp_8, x86_add_esp_4,
x86_add_esp_8, x86_integer_DFmode_moves,
x86_partial_reg_dependency, x86_accumulate_outgoing_args,
x86_prologue_using_move, x86_epilogue_using_move,
x86_arch_always_fancy_math_387, x86_sse_partial_reg_dependency,
x86_sse_load0_by_pxor, x86_rep_movl_optimal,
x86_ext_80387_constants, x86_four_jump_limit, x86_schedule,
x86_pad_returns): Add m_CORE2.
(override_options): Add entries for Core2.
(ix86_issue_rate): Add case for Core2.


This is OK.

I haven't seen any obvious conclusions in the following thread (other
than the fact that gcc/glibc block memory moves suck, which I already
know). But, in case I missed something, I'll preapprove any minor
changes to the costs, or adding or removing m_CORE2 in the processor
feature bitmasks.


Thanks, Ian. I'll commit the patch with what H.J. recommends (core2 will be then generic and x86_rep_movl_optimal) and continue the work to try to find a better combination. My be I 'll achieve the same score as generic + x86_rep_movl_optimal but smaller code size.

I think we should distinguish 32bit and 64bit since they may need
different optimization. Also I think 64bit -mtune=core2 should be the
same as -mtune=generic + x86_rep_movl_optimal since -mtune=generic +
x86_rep_movl_optimal generates better SPEC CPU 2K numbers than this
-mtune=core2. It doesn't make senses to have -mtune=core2 generate
slower code than -mtune=generic + x86_rep_movl_optimal.






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]