This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Fri, 16 Sep 2005, Geoffrey Keating wrote:
What this means in practise, I think, is that the structure that represents a token, 'struct cpp_token' will grow from 16 bytes to 20 bytes, which makes it 2 cache lines rather than 1, and a subsequent memory use increase and compiler performance decrease. It might be that someone will think of some clever way to avoid this, but I couldn't think of any that would be likely to be a win overall, since a significant proportion of tokens are identifiers. (I especially didn't like the alternative that required a second hash lookup for every identifier.)
There are plenty of spare bits in cpp_token to flag extended identifiers
and handle them specially (as a slow path, marked as such with
__builtin_expect). There's one bit in the flags byte, two unused bytes
after it and a whole word not used in the case of identifiers (identifiers
use a cpp_hashnode * where strings and numbers use a struct cpp_string
which is bigger) which could store a canonical form of an identifier (or
could store the noncanonical spelling for the use of the specific places
which care about the original spelling).
Adding salt to the wound, of course, is that for C the only difference
between an (A) or (B) and a (C) implementation is that a (C)
implementation is less expressive: there are some programs, all of
which are erroneous and require a diagnostic, that can't be written.
So you lose compiler performance just so users have another bullet
to shoot their feet with.
C++ requires (A)
Implementation of (A) could start by a (slow path, if there are extended
characters present) conversion of the whole input to UCNs, or a more
efficient conversion that avoids the need to convert within comments.
But if any normalisation of UCNs is documented for C++ it does need to be
documented in the form of transforming UCNs to other UCNs (not to UTF-8).
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |