[PATCH] Speed up LEX line cleaning a bit...

Joseph S. Myers joseph@codesourcery.com
Sun Mar 14 23:16:00 GMT 2010


On Sun, 14 Mar 2010, David Miller wrote:

> > See also Zack's ideas on speeding up _cpp_clean_line that I posted in 
> > <http://gcc.gnu.org/ml/gcc/2007-05/msg00741.html>.  It's not clear if they 
> > could be effectively combined with vectorization, or what would help 
> > performance more, or whether several simple passes or one more complicated 
> > combined pass would actually be better.
> 
> I think the state machine would prevent being able to use
> a vectorization optimization like that being described here.

I imagine it could be used for comment skipping or finding the end of a 
string, where you can skip all characters except a limited set.  But it's 
not at all clear which phases should be combined, and losing 
vectorization opportunities may be a disadvantage of combining too many.

> > you find a newline you could then check if the line was nonempty and what 
> > came before was a backslash.
> 
> Hmmm, the code seems to backtrack over any number of whitespace
> characters:
> 
> 	p = d;
> 	while (is_nvspace (p[-1]))
> 	  --p;
> 	if (p - 1 != pbackslash)
> 	  goto done;
> 
> so I don't think checking just one character behind the found newline
> would work.

I thought there was agreement to get rid of the special handling of 
whitespace between backslash and newline (see 
<http://gcc.gnu.org/ml/gcc-patches/2009-05/msg01502.html>).  But you'd 
still need to check to give warnings, so that wouldn't actually help here.  
There are lots of places where the standard preprocessing could be done 
quicker than preprocessing with warnings, tracking line and column 
numbers, etc., can be done.

> And this would incur more loads decreasing the effectiveness of the
> vectorization, which aims to minimize the number of loads.

The idea would be to reduce the number of operations done on each word 
while processing a line, so reducing the total number of instructions in 
the inner loop.  (Additionally, backslashes are common in strings, so it 
would be good not to have to leave the vectorized loop for them.)

-- 
Joseph S. Myers
joseph@codesourcery.com



More information about the Gcc-patches mailing list