Compiler identifier hashtable improvements (and ObjC cleanup)
Neil Booth
neil@daikokuya.demon.co.uk
Wed May 16 00:05:00 GMT 2001
Hi Zack,
Zack Weinberg wrote:-
> You're still allocating these via make_node. You might want to
> consider moving them into the obstack with the strings, since they can
> never be garbage collected anyway. This would save some memory;
> struct lang_identifier in C is 48 bytes, in C++ 44. They come out of
> the size-64 pool, so we're wasting 16-20 bytes per, and we can
> allocate thousands of them.
Yes I tried this; however is confuses garbage collection which expects
all trees to have been allocated from its pages. I had it storing the
strings immediately after the tree structure too, until I realised GC
got confused. Doing this saves half a pointer on average, since there
is no longer a need to store a pointer to the string, and you waste
half a pointer on average from alignment. Identifier node to char *
needs to become a subroutine if you do this. Since we're allocating
plain strings from the same obstack too, and so losing half a pointer
for them, storing them adjacently in the obstack might not be worth
it. Maybe 2 obstacks, one for IDENTIFIER_NODEs and one for text is a
better plan.
How much of struct identifier's common is really unused? Can they
never be chained? Is the type pointer never meaningful? If so, I can
(for the C front ends at least) use these for cpplib's information.
We should be able to move the rid enum to use 1 byte of the common
structure; because of alignment that will save a whole pointer at
present.
> The garbage collector would then have to be adjusted so it never
> marked an identifier_node, and the code which marks from the
> stringpool would need to go straight to the things the identifier_node
> points at.
Ah, yes. I hadn't considered doing that; one step at a time :-) It'd
probably be worth it; let's do this last, after we get CPP involved.
> Given that you changed ggc_alloc_string not to go through the hash
> table anymore, how do we get non-empty entries that haven't gone
> through get_identifier?
We don't, but we only store the string space (permanently - it is not
garbage collected) not the tree. So something else would have a
reference to a used node. Or am I missing something that could cause
problems?
> I understand that this works, but I'm not clear on why. This sounds
> like the way it used to work, which was broken because these
> identifiers were used in the protocol context, stored in trees, then
> examined (by grokdeclarator) outside the protocol context. At that
> point they'd stopped being magic.
I take it you mean were used in protocol context as reserved words,
not as identifiers?
Something similar was happening to me [about 6 testcases would fail,
right?] until I put the check in yylexname, and kept the identifiers
always flagged as RIDs. The hash entries are still available for use
as identifiers and contain the identifier information; just that they
are not recognised as such within the parser at the appropriate point.
The parser just wants to see the correct YACC code returned. When I
did this, the grokdeclarator issues and the regressions went away. I
admit I don't fully understand the way the C front end handles types
and grokdeclarator to be certain it's 100% safe; but the lack of
regressions seemed to validate it to some extent.
> Careful; some idioms can produce many copies of the same string. For
> example, the old assert() macro generated the same string constant
> every time it was used. The Linux kernel's BUG() macro has the same
> problem.
>
> This does not mean we need to handle them with the identifier hash
> table; in fact it's probably best if we don't. I do think some code
> should prevent duplicates. We already have code in varasm.c to
> prevent _emitting_ the same string more than once per file, perhaps it
> can be persuaded to do this job too.
Hmm. You may need to help me in that area.
> The patch does look nice and I look forward to the unified symbol
> handling between cpplib and front ends.
Me too. Thanks,
Neil.
More information about the Gcc-patches
mailing list