This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH Atom][PR middle-end/44382] Tree reassociation improvement


Hello,

Here is a patch related to missed optimization opportunity in tree
reassoc phase.

Currently tree reassoc phase always generates a linear form which
requires the minimum registers but has the highest tree height and
does not allow computation to be performed in parallel. It may be
critical for performance if required operations have high latency but
can be pipelined (i.e. few execution units or low throughput). This
problem becomes important on current Atom processors which are
in-order and have many such instructions: IMUL and scalar SSE FP
instructions.

This patch introduces a new feature to tree reassoc phase to generate
computation tree with reduced height allowing to perform few
long-latency instructions in parallel. It changes only one part of
reassociation - rewrite_expr_tree. A level of parallelism is
controlled via target hook and/or command line option.

New feature is enabled for Atom only by default. Patch boosts mostly
CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit.

Bootstrapped and checked on x86_64-linux.

Thanks,
Ilya
--
gcc/

2011-07-12  Enkovich Ilya  <ilya.enkovich@intel.com>

	* target.def (reassociation_width): New hook.

	* doc/tm.texi.in (reassociation_width): New hook documentation.

	* doc/tm.texi (reassociation_width): Likewise.

	* hooks.h (hook_int_const_gimple_1): New default hook.

	* hooks.c (hook_int_const_gimple_1): Likewise.

	* config/i386/i386.h (ix86_tune_indices): Add
	X86_TUNE_REASSOC_INT_TO_PARALLEL and
	X86_TUNE_REASSOC_FP_TO_PARALLEL.

	(TARGET_REASSOC_INT_TO_PARALLEL): New.
	(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

	* config/i386/i386.c (initial_ix86_tune_features): Add
	X86_TUNE_REASSOC_INT_TO_PARALLEL and
	X86_TUNE_REASSOC_FP_TO_PARALLEL.

	(ix86_reassociation_width) implementation of
	new hook for i386 target.

	* common.opt (ftree-reassoc-width): New option added.

	* tree-ssa-reassoc.c (get_required_cycles): New function.
	(get_reassociation_width): Likewise.
	(rewrite_expr_tree_parallel): Likewise.

	(reassociate_bb): Now checks reassociation width to be used
	and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
	if needed.

	(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-12  Enkovich Ilya  <ilya.enkovich@intel.com>

	* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
	-ftree-reassoc-width=1.

	* gcc.dg/tree-ssa/reassoc-24.c: New test.
	* gcc.dg/tree-ssa/reassoc-25.c: Likewise.

Attachment: PR44382.diff
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]