Eclipse's "hippie completion" (similar to word completion in vim or Emacs) does not work in Fedora due to a bug in our regular expression code. I believe this is a GNU Classpath regex issue. I have made a test case (soon to be attached). javac TestHippieRegex.java java TestHippieRegex With the Sun JVM, I get the following: ++++++++++++++ trying fFindReplaceMatcher.find(100) fFindReplaceMatcher.pattern().pattern() = [\p{L}[\p{Mn}[\p{Pc}[\p{Nd}[\p{Nl}[\p{Sc}]]]]]]+ ++++++++++++++ found = true but with gij (Fedora rawhide's 4.1.0-0.20), I get: ++++++++++++++ trying fFindReplaceMatcher.find(100) fFindReplaceMatcher.pattern().pattern() = [\p{L}[\p{Mn}[\p{Pc}[\p{Nd}[\p{Nl}[\p{Sc}]]]]]]+ ++++++++++++++ found = false I'm investigating this because of https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=178648
Created attachment 10799 [details] test case
Named property support was added recently: 2006-01-31 Ito Kazumitsu <kaz@maczuka.gcd.org> Fixes bug #26002 * gnu/regexp/gnu/regexp/RE.java(initialize): Parse /\p{prop}/. (NamedProperty): New inner class. (getNamedProperty): New method. (getRETokenNamedProperty): New Method. * gnu/regexp/RESyntax.java(RE_NAMED_PROPERTY): New syntax falg. * gnu/regexp/RETokenNamedProperty.java: New file. But the attached test program still fails.
This might be because we are not handling character class union [<class>[<another-class>]] correctly. In this case it seems the regular expression can be rewritten with logical or as follows: Pattern pattern= Pattern.compile("(\\p{L}|\\p{Mn}|\\p{Pc}|\\p{Nd}|\\p{Nl}|\\p{Sc})+", patternFlags); In that case the test program does return true. The above is just a fancy way of saying you want to match a string of one or more Letters, Modifier symbols, Connector punctuation, Decimal digit numbers, Letter numbers and Currency symbols.
While playing with the gnu.regexp package these days, I wished the day would not come so soon when the nested character class expression such as [aaa[xyz]] is needed. This syntax is not in Perl, but Sun's JDK introduced it. [X[Y[^Z]]] would not be so difficult. It is equivalent to X|Y|[^Z]. I think I can manage to do this. But Sun's JDK introduced another syntax like [X&&[Y[^Z]]] meaning X and (Y or not Z). Supporting "&&" will require a serious design change.
Before implementing this new syntax, could someone explain this strange behavior of Sun's JDK? The source of W.java used here is attached below. bash$ java -DFIND=1 W 'b' '[^b]' false bash$ java -DFIND=1 W 'b' '[^b[b]]' true G0 = b bash$ java -DFIND=1 W 'b' '[^b[b]b]' false bash$ java -DFIND=1 W 'b' '[^b[b]b[b]]' true G0 = b bash$ java -DFIND=1 W 'b' '[^b[b]b[b]b]' false bash$ java -DFIND=1 W 'b' '[^[b]]' true G0 = b bash$ java -DFIND=1 W 'b' '[^[b]b]' false bash$ java -DFIND=1 W 'b' '[^[b]b[b]]' true G0 = b bash$ java -DFIND=1 W 'b' '[^[b]b[b]b]' false bash$ cat W.java import java.util.regex.*; public class W { public static void main(String[] args) throws Exception { int flags = 0; boolean find = (System.getProperty("FIND") != null); if (System.getProperty("CASE_INSENSITIVE") != null) { flags |= Pattern.CASE_INSENSITIVE; } Pattern p = Pattern.compile(args[1], flags); Matcher m = p.matcher(args[0]); boolean b = (find ? m.find() : m.matches()); System.out.println(b); if (b) { int groups = m.groupCount(); for (int i = 0; i <= groups; i++) { System.out.println("G" + i + " = " + m.group(i)); } } } } I assume [^X[Y][Z]] means (not X) or Y or Z whrere X must not contain a subclass enclosed by []. [^[X]] and [^X[Y]Z] are invalid expressions whose matching results are meaningless, although Sun's JDK neglects the checking of validity.
Subject: Bug 26166 CVSROOT: /cvsroot/classpath Module name: classpath Branch: Changes by: Ito Kazumitsu <itokaz@savannah.gnu.org> 06/02/13 13:19:44 Modified files: . : ChangeLog gnu/regexp : RE.java RESyntax.java RETokenOneOf.java Log message: 2006-02-13 Ito Kazumitsu <kaz@maczuka.gcd.org> Fixes bug #26166 * gnu/regexp/RE.java(initialize): Parsing of character class expression was moved to a new method parseCharClass. (parseCharClass): New method originally in initialize. Added parsing of nested character classes. (ParseCharClassResult): New inner class used as a return value of parseCharClass. (getCharExpression),(getNamedProperty): Made static. * gnu/regexp/RESyntax.java(RE_NESTED_CHARCLASS): New syntax flag. * gnu/regexp/RETokenOneOf.java(addition): New Vector for storing nested character classes. (RETokenOneOf): New constructor accepting the Vector addition. (getMinimumLength), (getMaximumLength): Returns 1 if the token stands for only one character. (match): Added the processing of the Vector addition. (matchN), (matchP): Do not check next token if addition is used. CVSWeb URLs: http://cvs.savannah.gnu.org/viewcvs/classpath/classpath/ChangeLog.diff?tr1=1.6350&tr2=1.6351&r1=text&r2=text http://cvs.savannah.gnu.org/viewcvs/classpath/classpath/gnu/regexp/RE.java.diff?tr1=1.17&tr2=1.18&r1=text&r2=text http://cvs.savannah.gnu.org/viewcvs/classpath/classpath/gnu/regexp/RESyntax.java.diff?tr1=1.6&tr2=1.7&r1=text&r2=text http://cvs.savannah.gnu.org/viewcvs/classpath/classpath/gnu/regexp/RETokenOneOf.java.diff?tr1=1.6&tr2=1.7&r1=text&r2=text
Fixed.
Subject: Bug 26166 Author: tromey Date: Mon Feb 13 22:58:37 2006 New Revision: 110937 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=110937 Log: 2006-02-13 Ito Kazumitsu <kaz@maczuka.gcd.org> Fixes bug #26166 * gnu/regexp/RE.java(initialize): Parsing of character class expression was moved to a new method parseCharClass. (parseCharClass): New method originally in initialize. Added parsing of nested character classes. (ParseCharClassResult): New inner class used as a return value of parseCharClass. (getCharExpression),(getNamedProperty): Made static. * gnu/regexp/RESyntax.java(RE_NESTED_CHARCLASS): New syntax flag. * gnu/regexp/RETokenOneOf.java(addition): New Vector for storing nested character classes. (RETokenOneOf): New constructor accepting the Vector addition. (getMinimumLength), (getMaximumLength): Returns 1 if the token stands for only one character. (match): Added the processing of the Vector addition. (matchN), (matchP): Do not check next token if addition is used. Modified: branches/gcc-4_1-branch/libjava/classpath/ChangeLog.gcj branches/gcc-4_1-branch/libjava/classpath/gnu/regexp/RE.java branches/gcc-4_1-branch/libjava/classpath/gnu/regexp/RESyntax.java branches/gcc-4_1-branch/libjava/classpath/gnu/regexp/RETokenOneOf.java