This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: bumming cycles out of parse_identifier()...


Zack Weinberg wrote:-

> I rearranged things so that the normal case doesn't copy the string
> and doesn't concern itself with any of the three problem cases.  If
> we hit any of them we bail out to a slow path.  I also spent a fair
> amount of effort making sure the normal case had no mispredicted
> branches.  Here's another profile excerpt.

Cool, I've been meaning to do something similar myself for a while.
Your patch looks better than what I'd envisaged.

> I don't think it's practical to make it go any faster short of dirty
> tricks, e.g. doing word-size fetches and clever shifts instead of byte
> fetches.

Well, we could eliminate the "cur < limit" check.  We naturally have
to do this for every single character in the file.  The question
becomes: does the savings of mmap () outweigh the savings of removing
"cur < limit" checks from the fastpath?

If we read the file into a buffer, like we currently do for pipes, we
could terminate it with a NUL.  Then many of the checks could die,
since NUL is not e.g. a valid identifier character.  The other places
that handle NUL (whitespace and comment skipping, string lexing) would
need to additionally check that cur != limit to determine whether they
had a real NUL or EOF.  What do you think?

> Tomorrow, I consider reinventing stdio.  WTF is it doing spending
> 15% of runtime in fputs subroutines?

That's the glibc bottleneck.  I have no idea if other implementations
are faster.  Since it's only standalone-cpp that cares, I'm not sure
doing anything extra is worth it.  There are still wins to be had
elsewhere.  Jan sent me a mail a couple of weeks ago about how he'd
greatly improved the speed of comment skipping by creating a new
category for "interesting characters in comments" like I mentioned in
a comment somewhere.

I'm still working on the memory storage for lexing tokens, which
should ultimately lead to wins by getting rid of lookbacks (which
amongst other things would kill 2 conditionals in the busy routine
cpp_get_token), and allow more memory-efficient macro expansion.  It
will give some big wins to Mark's C++ parser too, I hope.

> If this assumption is incorrect please let me know.

I get the failures here too, without your patch.

I think your patch is a regression for the "don't step back" rule we
tried to follow in cpplex.c.  However, I'm fed up with that rule and
want to kill it.  Killing it will allow other gunk in cpplex.c to die
too, like "lex_dot" and "lex_percent" to name but 2 places.

If we're moving to UTF-8 like we claim, we don't need to worry about
well-chosen step backs.  I'm thinking about a patch to introduce some
kind of locale-based encoding conversion with iconv when we load a
file, after some preliminary discussion with Bruno Haible and Marcus
Kuhn.

Neil.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]