This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: cutting the size of CONST_DECLs



On Feb 27, 2004, at 4:20 PM, Dan Nicolaescu wrote:
You might also want to look at doing something about FUNCTION_DECLs. There are
a few fields in struct tree_decl that are only used for
FUNCTION_DECLs: inlined_fns, saved_tree.

Yep, separating FUNCTION_DECLs from everything else would definitely help. One problem is that function declarations aren't the most common node we see when parsing our headers; in my examples, FUNCTION_DECLs were only 14-20% of declarations. PARM_DECLs were usually 33-36% of declarations, so separating them out makes twice the difference.


Here's the distributions of declaration kinds I saw on some example compiles:

		PARAM	FN		CONST	TYPE	FIELD	VAR
FinderFE	30%		29%		24%		16%		?		?
gcc		30%		34%		10%		3%		21%		3%
rogue	27%		59%		0%		5%		9%		5%
AppearanceSample
		36%		20%		18%		16%		7%		0%
SimpleText
		33%		14%		28%		13%		11%		4%

Each represents declaration kinds found during compile of a representative file. FinderFE is the C++ front end to the Mac OS Finder. Appearance Sample and SimpleText are two GUI applications from Apple's sample code; AppearanceSample is C++, and SimpleText C.

One possibility I've strongly considered is to have three levels of declarations:

small: CONST_DECL
medium: PARM_DECL, TYPE_DECL,FIELD_DECL
large: FN_DECL, VAR_DECL

On 3.3, small declarations to 72 bytes, medium declarations 100 bytes, and large declarations 116 bytes. I've done a few measurements with this set of objects; memory use and compile times improve again from the CONST_DECL/everything else scheme, though the memory improvements are still in the <10% ballpark, and compile time improvements are much smaller. I still need to do more measurements on non-Apple source code, and double-check that the changes aren't slowing the compiler.

Adding different kinds of declarations adds a bit of overhead, so we don't want to take this too far. At a minimum, the DECL_P() macro needs to do more work to decide if a given node is a declaration. On a stock system, this is a single comparison: is the tree code class 'd'? With three kinds of declarations, I could make the tree code classes 'd', 'D', and '$'. With these characters, the test in DECL_P() can use a bitmask to see if (tree_code & 0x1f) == 0x04 rather than doing three comparisons.

Deciding which kinds of declarations behave similarly can be difficult from just the table of access patterns. I ended up using concept analysis to identify likely hierarchies. (Snelting and Tip, "Understanding Class Hierarchies using Concept Analysis", ACM Transactions on Programming Languages and Systems 22(3), May 2000). There's a tool available on Sourceforge ("conexp") to draw an optimized class hierarchy (well, a concept lattice) given a set of objects and properties. The three-level scheme pretty much fell out once I looked at the lattice.

Robert


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]