This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[4.1] Patch: FYI: big regex merge


I'm checking this in on the 4.1 branch.

This merges in all the recent regex fixes from GNU Classpath.

Ordinarily I would not like to put in a big patch like this at the
last minute, but:

* It affects several real applications
* It fixes a large number of Mauve tests (more than 2000)
* Our current regular expression code is broken enough that
  this is unlikely to cause regressions
* It is pure java and so relatively safe

Tested on x86 FC4, including Mauve.  Also Mark tested this against
Eclipse.

Tom


Index: ChangeLog
from  Tom Tromey  <tromey@redhat.com>

	* java/lang/Character.java: Merged from Classpath.
	(start, end): Now 'int's.
	(canonicalName): New field.
	(CANONICAL_NAME, NO_SPACES_NAME, CONSTANT_NAME): New constants.
	(UnicodeBlock): Added argument.
	(of): New overload.
	(forName): New method.
	Updated unicode blocks.
	(sets): Updated.
	* sources.am, Makefile.in: Rebuilt.

2006-01-13  Tom Tromey  <tromey@redhat.com>

	* gnu/regexp/MessagesBundle_fr.properties: Removed.
	* gnu/regexp/MessagesBundle.properties: Removed.

Index: classpath/ChangeLog.gcj
from  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #26112
	* gnu/regexp/RE.java(REG_REPLACE_USE_BACKSLASHESCAPE): New execution
	flag which enables backslash escape in a replacement.
	(getReplacement): New public static method. 
	(substituteImpl),(substituteAllImpl): Use getReplacement.
	* gnu/regexp/REMatch.java(substituteInto): Replace $n even if n>=10.
	* java/util/regex/Matcher.java(appendReplacement)
	Use RE#getReplacement.
	(replaceFirst),(replaceAll): Use RE.REG_REPLACE_USE_BACKSLASHESCAPE.

2006-02-06  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	* java/util/regex/Matcher.java(matches):
	set RE.REG_TRY_ENTIRE_MATCH as an execution flag of getMatch.

2006-02-06  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #25812
	* gnu/regexp/CharIndexed.java(lookBehind),(length): New method.
	* gnu/regexp/CharIndexedCharArray.java
	(lookBehind),(length): Implemented.
	* gnu/regexp/CharIndexedInputStream.java: Likewise.
	* gnu/regexp/CharIndexedString.java: Likewise.
	* gnu/regexp/CharIndexedStringBuffer.java: Likewise.
	* gnu/regexp/REToken.java(getMaximumLength): New method.
	* gnu/regexp/RE.java(internal constructor RE): Added new argument
	maxLength.
	(initialize): Parse (?<=X), (?<!X), (?>X).
	(getMaximumLength): Implemented.
	* gnu/regexp/RETokenAny.java(getMaximumLength): Implemented.
	* gnu/regexp/RETokenChar.java: Likewise.
	* gnu/regexp/RETokenEnd.java: Likewise.
	* gnu/regexp/RETokenEndSub.java: Likewise.
	* gnu/regexp/RETokenLookAhead.java: Likewise.
	* gnu/regexp/RETokenNamedProperty.java: Likewise.
	* gnu/regexp/RETokenOneOf.java: Likewise.
	* gnu/regexp/RETokenPOSIX.java: Likewise.
	* gnu/regexp/RETokenRange.java: Likewise.
	* gnu/regexp/RETokenRepeated.java: Likewise.
	* gnu/regexp/RETokenStart.java: Likewise.
	* gnu/regexp/RETokenWordBoundary.java: Likewise.
	* gnu/regexp/RETokenIndependent.java: New file.
	* gnu/regexp/RETokenLookBehind.java: New file.

2006-02-04  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	* gnu/regexp/RETokenNamedProperty.java(getHandler): Check for
	a Unicode block if the name starts with "In".
	(UnicodeBlockHandler): New inner class.

2006-02-02  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	* gnu/regexp/REMatch.java(REMatchList): New inner utility class
	for making a list of REMatch instances.
	* gnu/regexp/RETokenOneOf.java(match): Rewritten using REMatchList.
	* gnu/regexp/RETokenRepeated.java(findDoables): New method.
	(match): Rewritten using REMatchList.
	(matchRest): Rewritten using REMatchList.

2006-02-01  Mark Wielaard  <mark@klomp.org>

	* gnu/regexp/RE.java (getRETokenNamedProperty): Chain exception.
	* gnu/regexp/RETokenNamedProperty.java (LETTER, MARK, SEPARATOR,
	SYMBOL, NUMBER, PUNCTUATION, OTHER): New final byte[] fields.
	(getHandler): Check for grouped properties L, M, Z, S, N, P or C.
	(UnicodeCategoriesHandler): New private static class.

2006-01-31  Mark Wielaard  <mark@klomp.org>

	* java/net/URI.java (getURIGroup): Check for null to see whether
	group actually exists.

2006-01-31  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #22873
	* gnu/regexp/REMatch(toString(int)): Throw IndexOutOfBoundsException
	for an invalid index and return null for a skipped group.

2006-01-31  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #26002
	* gnu/regexp/gnu/regexp/RE.java(initialize): Parse /\p{prop}/.
	(NamedProperty): New inner class.
	(getNamedProperty): New method.
	(getRETokenNamedProperty): New Method.
	* gnu/regexp/RESyntax.java(RE_NAMED_PROPERTY): New syntax falg.
	* gnu/regexp/RETokenNamedProperty.java: New file.

2006-01-30  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #24876
	* gnu/regexp/gnu/regexp/RE.java(REG_TRY_ENTIRE_MATCH):
	New execution flag.
	(getMatchImpl): if REG_TRY_ENTIRE_MATCH is set, add an
	implicit RETokenEnd at the end of the regexp chain.
	Do not select the longest match, but select the first match.
	(match): Do not take care of REMatch.empty.
	* gnu/regexp/REMatch.java(empty): To be used only in RETokenRepeated.
	* gnu/regexp/RETokenOneOf.java: Corrected a typo in a comment.
	* gnu/regexp/RETokenBackRef.java: Do not take care of REMatch.empty.
	* gnu/regexp/RETokenRepeated.java (match): Rewrote stingy matching.
	Do not take care of REMatch.empty. Set and check REMatch.empty
	when trying to match the single token.

2006-01-24  Tom Tromey  <tromey@redhat.com>

	* java/util/regex/PatternSyntaxException.java: Added @since.
	* java/util/regex/Matcher.java (Matcher): Implements MatchResult.
	* java/util/regex/MatchResult.java: New file.

2006-01-23  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	* gnu/regexp/REToken.java(empty): Made Cloneable.
	* gnu/regexp/RETokenOneOf.java(match): RE.java(match):
	Use separate methods matchN and matchP depending on the
	boolean negative.
	(matchN): New method used when negative. Done as before.
	(matchP): New method used when not negative. Each token is
	tried not by itself but by a clone of it.

2006-01-22  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #25837
	* gnu/regexp/REMatch.java(empty): New boolean indicating
	an empty string matched.
	* gnu/regexp/RE.java(match): Sets empty flag when an empty
	string matched.
	(initialize): Support back reference \10, \11, and so on.
	(parseInt): renamed from getEscapedChar and returns int.
	* gnu/regexp/RETokenRepeated.java(match): Sets empty flag
	when an empty string matched. Fixed a bug of the case where
	an empty string matched. Added special handling of {0}.
	* gnu/regexp/RETokenBackRef.java(match): Sets empty flag
	when an empty string matched. Fixed the case insensitive matching.

2006-01-19  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #23212
	* gnu/regexp/RE.java(initialize): Support escaped characters such as
	\0123, \x1B, \u1234.
	(getEscapedChar): New method.
	(CharExpression): New inner class.
	(getCharExpression): New Method.
	* gnu/regexp/RESyntax.java(RE_OCTAL_CHAR, RE_HEX_CHAR,
	RE_UNICODE_CHAR): New syntax bits.

2006-01-17  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #25817
	* gnu/regexp/RETokenRange.java(constructor):
	Keep lo and hi as they are.
	(match): Changed the case insensitive comparison.

2006-01-17  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	* gnu/regexp/RETokenChar.java(chain):
	Do not concatenate tokens whose insens flags are diffent.

2006-01-16  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #22884
	* gnu/regexp/RE.java(initialize): Parse embedded flags.
	* gnu/regexp/RESyntax.java(RE_EMBEDDED_FLAGS): New syntax bit.

2006-01-13  Mark Wielaard  <mark@klomp.org>

	* java/util/regex/Pattern.java (Pattern): Chain REException.

2006-01-12  Ito Kazumitsu  <kaz@maczuka.gcd.org>

	Fixes bug #22802
	* gnu/regexp/RE.java(initialize): Fixed the parsing of
	character classes within a subexpression.

2006-01-08  Ito Kazumitsu  <kaz@maczuka.gcd.org>  

	Fixes bug #25679
	* gnu/regexp/RETokenRepeated.java(match): Optimized the case
	when an empty string matched an empty token.

2006-01-06  Ito Kazumitsu  <kaz@maczuka.gcd.org>  

	Fixes bug #25616
	* gnu/regexp/RE.java(initialize): Allow repeat.empty.token.
	* gnu/regexp/RETokenRepeated.java(match): Break the loop
	when an empty string matched an empty token.

Index: gnu/regexp/MessagesBundle_fr.properties
===================================================================
--- gnu/regexp/MessagesBundle_fr.properties	(revision 110832)
+++ gnu/regexp/MessagesBundle_fr.properties	(working copy)
@@ -1,22 +0,0 @@
-# Localized error messages for gnu.regexp
-
-# Prefix for REException messages
-error.prefix=A l''index {0} dans le modèle d''expression régulière:
-
-# REException (parse error) messages
-repeat.assertion=l'élément répété est de largeur zéro
-repeat.chained=tentative de répétition d'un élément déjà répété
-repeat.no.token=quantifieur (?*+{}) sans élément précédent
-repeat.empty.token=l'élément répété peut être vide
-unmatched.brace=accolade inégalée
-unmatched.bracket=crochet inégalé
-unmatched.paren=parenthèse inégalée
-interval.no.end=fin d'interval attendue
-class.no.end=fin de classe de caractères attendue
-subexpr.no.end=fin de sous-expression attendue
-interval.order=l'interval minimum est supérieur à l'interval maximum
-interval.error=l'interval est vide ou contient des caractères illégaux
-ends.with.backslash=antislash à la fin du modèle
-
-# RESyntax message
-syntax.final=La syntaxe a été déclarée finale et ne peut pas être modifiée
Index: gnu/regexp/MessagesBundle.properties
===================================================================
--- gnu/regexp/MessagesBundle.properties	(revision 110832)
+++ gnu/regexp/MessagesBundle.properties	(working copy)
@@ -1,22 +0,0 @@
-# Localized error messages for gnu.regexp
-
-# Prefix for REException messages
-error.prefix=At position {0} in regular expression pattern:
-
-# REException (parse error) messages
-repeat.assertion=repeated token is zero-width assertion
-repeat.chained=attempted to repeat a token that is already repeated
-repeat.no.token=quantifier (?*+{}) without preceding token
-repeat.empty.token=repeated token may be empty
-unmatched.brace=unmatched brace
-unmatched.bracket=unmatched bracket
-unmatched.paren=unmatched parenthesis
-interval.no.end=expected end of interval
-class.no.end=expected end of character class
-subexpr.no.end=expected end of subexpression
-interval.order=interval minimum is greater than maximum
-interval.error=interval is empty or contains illegal characters
-ends.with.backslash=backslash at end of pattern
-
-# RESyntax message
-syntax.final=Syntax has been declared final and cannot be modified
Index: java/lang/Character.java
===================================================================
--- java/lang/Character.java	(revision 110832)
+++ java/lang/Character.java	(working copy)
@@ -48,6 +48,8 @@
 package java.lang;
 
 import java.io.Serializable;
+import java.text.Collator;
+import java.util.Locale;
 
 /**
  * Wrapper class for the primitive char data type.  In addition, this class
@@ -150,11 +152,19 @@
   public static final class UnicodeBlock extends Subset
   {
     /** The start of the subset. */
-    private final char start;
+    private final int start;
 
     /** The end of the subset. */
-    private final char end;
+    private final int end;
 
+    /** The canonical name of the block according to the Unicode standard. */
+    private final String canonicalName;
+
+    /** Constants for the <code>forName()</code> method */
+    private static final int CANONICAL_NAME = 0;
+    private static final int NO_SPACES_NAME = 1;
+    private static final int CONSTANT_NAME = 2;
+
     /**
      * Constructor for strictly defined blocks.
      *
@@ -162,24 +172,43 @@
      * @param end the end character of the range
      * @param name the block name
      */
-    private UnicodeBlock(char start, char end, String name)
+    private UnicodeBlock(int start, int end, String name,
+             String canonicalName)
     {
       super(name);
       this.start = start;
       this.end = end;
+      this.canonicalName = canonicalName;
     }
 
     /**
      * Returns the Unicode character block which a character belongs to.
+     * <strong>Note</strong>: This method does not support the use of
+     * supplementary characters.  For such support, <code>of(int)</code>
+     * should be used instead.
      *
      * @param ch the character to look up
      * @return the set it belongs to, or null if it is not in one
      */
     public static UnicodeBlock of(char ch)
     {
-      // Special case, since SPECIALS contains two ranges.
-      if (ch == '\uFEFF')
-        return SPECIALS;
+      return of((int) ch);
+    }
+
+    /**
+     * Returns the Unicode character block which a code point belongs to.
+     *
+     * @param codePoint the character to look up
+     * @return the set it belongs to, or null if it is not in one.
+     * @throws IllegalArgumentException if the specified code point is
+     *         invalid.
+     * @since 1.5
+     */
+    public static UnicodeBlock of(int codePoint)
+    {
+      if (codePoint > MAX_CODE_POINT)
+    throw new IllegalArgumentException("The supplied integer value is " +
+                       "too large to be a codepoint.");
       // Simple binary search for the correct block.
       int low = 0;
       int hi = sets.length - 1;
@@ -187,9 +216,9 @@
         {
           int mid = (low + hi) >> 1;
           UnicodeBlock b = sets[mid];
-          if (ch < b.start)
+          if (codePoint < b.start)
             hi = mid - 1;
-          else if (ch > b.end)
+          else if (codePoint > b.end)
             low = mid + 1;
           else
             return b;
@@ -198,705 +227,1302 @@
     }
 
     /**
+     * <p>
+     * Returns the <code>UnicodeBlock</code> with the given name, as defined
+     * by the Unicode standard.  The version of Unicode in use is defined by
+     * the <code>Character</code> class, and the names are given in the
+     * <code>Blocks-<version>.txt</code> file corresponding to that version.
+     * The name may be specified in one of three ways:
+     * </p>
+     * <ol>
+     * <li>The canonical, human-readable name used by the Unicode standard.
+     * This is the name with all spaces and hyphens retained.  For example,
+     * `Basic Latin' retrieves the block, UnicodeBlock.BASIC_LATIN.</li>
+     * <li>The canonical name with all spaces removed e.g. `BasicLatin'.</li>
+     * <li>The name used for the constants specified by this class, which
+     * is the canonical name with all spaces and hyphens replaced with
+     * underscores e.g. `BASIC_LATIN'</li>
+     * </ol>
+     * <p>
+     * The names are compared case-insensitively using the case comparison
+     * associated with the U.S. English locale.  The method recognises the
+     * previous names used for blocks as well as the current ones.  At
+     * present, this simply means that the deprecated `SURROGATES_AREA'
+     * will be recognised by this method (the <code>of()</code> methods
+     * only return one of the three new surrogate blocks).
+     * </p>
+     *
+     * @param blockName the name of the block to look up.
+     * @return the specified block.
+     * @throws NullPointerException if the <code>blockName</code> is
+     *         <code>null</code>.
+     * @throws IllegalArgumentException if the name does not match any Unicode
+     *         block.
+     * @since 1.5
+     */
+    public static final UnicodeBlock forName(String blockName)
+    {
+      int type;
+      if (blockName.indexOf(' ') != -1)
+        type = CANONICAL_NAME;
+      else if (blockName.indexOf('_') != -1)
+        type = CONSTANT_NAME;
+      else
+        type = NO_SPACES_NAME;
+      Collator usCollator = Collator.getInstance(Locale.US);
+      usCollator.setStrength(Collator.PRIMARY);
+      /* Special case for deprecated blocks not in sets */
+      switch (type)
+      {
+        case CANONICAL_NAME:
+          if (usCollator.compare(blockName, "Surrogates Area") == 0)
+            return SURROGATES_AREA;
+          break;
+        case NO_SPACES_NAME:
+          if (usCollator.compare(blockName, "SurrogatesArea") == 0)
+            return SURROGATES_AREA;
+          break;
+        case CONSTANT_NAME:
+          if (usCollator.compare(blockName, "SURROGATES_AREA") == 0) 
+            return SURROGATES_AREA;
+          break;
+      }
+      /* Other cases */
+      int setLength = sets.length;
+      switch (type)
+      {
+        case CANONICAL_NAME:
+          for (int i = 0; i < setLength; i++)
+            {
+              UnicodeBlock block = sets[i];
+              if (usCollator.compare(blockName, block.canonicalName) == 0)
+                return block;
+            }
+          break;
+        case NO_SPACES_NAME:
+          for (int i = 0; i < setLength; i++)
+            {
+              UnicodeBlock block = sets[i];
+              String nsName = block.canonicalName.replaceAll(" ","");
+              if (usCollator.compare(blockName, nsName) == 0)
+                return block;
+            }        
+          break;
+        case CONSTANT_NAME:
+          for (int i = 0; i < setLength; i++)
+            {
+              UnicodeBlock block = sets[i];
+              if (usCollator.compare(blockName, block.toString()) == 0)
+                return block;
+            }
+          break;
+      }
+      throw new IllegalArgumentException("No Unicode block found for " +
+                                         blockName + ".");
+    }
+
+    /**
      * Basic Latin.
-     * '\u0000' - '\u007F'.
+     * 0x0000 - 0x007F.
      */
     public static final UnicodeBlock BASIC_LATIN
-      = new UnicodeBlock('\u0000', '\u007F',
-                         "BASIC_LATIN");
+      = new UnicodeBlock(0x0000, 0x007F,
+                         "BASIC_LATIN", 
+                         "Basic Latin");
 
     /**
      * Latin-1 Supplement.
-     * '\u0080' - '\u00FF'.
+     * 0x0080 - 0x00FF.
      */
     public static final UnicodeBlock LATIN_1_SUPPLEMENT
-      = new UnicodeBlock('\u0080', '\u00FF',
-                         "LATIN_1_SUPPLEMENT");
+      = new UnicodeBlock(0x0080, 0x00FF,
+                         "LATIN_1_SUPPLEMENT", 
+                         "Latin-1 Supplement");
 
     /**
      * Latin Extended-A.
-     * '\u0100' - '\u017F'.
+     * 0x0100 - 0x017F.
      */
     public static final UnicodeBlock LATIN_EXTENDED_A
-      = new UnicodeBlock('\u0100', '\u017F',
-                         "LATIN_EXTENDED_A");
+      = new UnicodeBlock(0x0100, 0x017F,
+                         "LATIN_EXTENDED_A", 
+                         "Latin Extended-A");
 
     /**
      * Latin Extended-B.
-     * '\u0180' - '\u024F'.
+     * 0x0180 - 0x024F.
      */
     public static final UnicodeBlock LATIN_EXTENDED_B
-      = new UnicodeBlock('\u0180', '\u024F',
-                         "LATIN_EXTENDED_B");
+      = new UnicodeBlock(0x0180, 0x024F,
+                         "LATIN_EXTENDED_B", 
+                         "Latin Extended-B");
 
     /**
      * IPA Extensions.
-     * '\u0250' - '\u02AF'.
+     * 0x0250 - 0x02AF.
      */
     public static final UnicodeBlock IPA_EXTENSIONS
-      = new UnicodeBlock('\u0250', '\u02AF',
-                         "IPA_EXTENSIONS");
+      = new UnicodeBlock(0x0250, 0x02AF,
+                         "IPA_EXTENSIONS", 
+                         "IPA Extensions");
 
     /**
      * Spacing Modifier Letters.
-     * '\u02B0' - '\u02FF'.
+     * 0x02B0 - 0x02FF.
      */
     public static final UnicodeBlock SPACING_MODIFIER_LETTERS
-      = new UnicodeBlock('\u02B0', '\u02FF',
-                         "SPACING_MODIFIER_LETTERS");
+      = new UnicodeBlock(0x02B0, 0x02FF,
+                         "SPACING_MODIFIER_LETTERS", 
+                         "Spacing Modifier Letters");
 
     /**
      * Combining Diacritical Marks.
-     * '\u0300' - '\u036F'.
+     * 0x0300 - 0x036F.
      */
     public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS
-      = new UnicodeBlock('\u0300', '\u036F',
-                         "COMBINING_DIACRITICAL_MARKS");
+      = new UnicodeBlock(0x0300, 0x036F,
+                         "COMBINING_DIACRITICAL_MARKS", 
+                         "Combining Diacritical Marks");
 
     /**
      * Greek.
-     * '\u0370' - '\u03FF'.
+     * 0x0370 - 0x03FF.
      */
     public static final UnicodeBlock GREEK
-      = new UnicodeBlock('\u0370', '\u03FF',
-                         "GREEK");
+      = new UnicodeBlock(0x0370, 0x03FF,
+                         "GREEK", 
+                         "Greek");
 
     /**
      * Cyrillic.
-     * '\u0400' - '\u04FF'.
+     * 0x0400 - 0x04FF.
      */
     public static final UnicodeBlock CYRILLIC
-      = new UnicodeBlock('\u0400', '\u04FF',
-                         "CYRILLIC");
+      = new UnicodeBlock(0x0400, 0x04FF,
+                         "CYRILLIC", 
+                         "Cyrillic");
 
     /**
+     * Cyrillic Supplementary.
+     * 0x0500 - 0x052F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock CYRILLIC_SUPPLEMENTARY
+      = new UnicodeBlock(0x0500, 0x052F,
+                         "CYRILLIC_SUPPLEMENTARY", 
+                         "Cyrillic Supplementary");
+
+    /**
      * Armenian.
-     * '\u0530' - '\u058F'.
+     * 0x0530 - 0x058F.
      */
     public static final UnicodeBlock ARMENIAN
-      = new UnicodeBlock('\u0530', '\u058F',
-                         "ARMENIAN");
+      = new UnicodeBlock(0x0530, 0x058F,
+                         "ARMENIAN", 
+                         "Armenian");
 
     /**
      * Hebrew.
-     * '\u0590' - '\u05FF'.
+     * 0x0590 - 0x05FF.
      */
     public static final UnicodeBlock HEBREW
-      = new UnicodeBlock('\u0590', '\u05FF',
-                         "HEBREW");
+      = new UnicodeBlock(0x0590, 0x05FF,
+                         "HEBREW", 
+                         "Hebrew");
 
     /**
      * Arabic.
-     * '\u0600' - '\u06FF'.
+     * 0x0600 - 0x06FF.
      */
     public static final UnicodeBlock ARABIC
-      = new UnicodeBlock('\u0600', '\u06FF',
-                         "ARABIC");
+      = new UnicodeBlock(0x0600, 0x06FF,
+                         "ARABIC", 
+                         "Arabic");
 
     /**
      * Syriac.
-     * '\u0700' - '\u074F'.
+     * 0x0700 - 0x074F.
      * @since 1.4
      */
     public static final UnicodeBlock SYRIAC
-      = new UnicodeBlock('\u0700', '\u074F',
-                         "SYRIAC");
+      = new UnicodeBlock(0x0700, 0x074F,
+                         "SYRIAC", 
+                         "Syriac");
 
     /**
      * Thaana.
-     * '\u0780' - '\u07BF'.
+     * 0x0780 - 0x07BF.
      * @since 1.4
      */
     public static final UnicodeBlock THAANA
-      = new UnicodeBlock('\u0780', '\u07BF',
-                         "THAANA");
+      = new UnicodeBlock(0x0780, 0x07BF,
+                         "THAANA", 
+                         "Thaana");
 
     /**
      * Devanagari.
-     * '\u0900' - '\u097F'.
+     * 0x0900 - 0x097F.
      */
     public static final UnicodeBlock DEVANAGARI
-      = new UnicodeBlock('\u0900', '\u097F',
-                         "DEVANAGARI");
+      = new UnicodeBlock(0x0900, 0x097F,
+                         "DEVANAGARI", 
+                         "Devanagari");
 
     /**
      * Bengali.
-     * '\u0980' - '\u09FF'.
+     * 0x0980 - 0x09FF.
      */
     public static final UnicodeBlock BENGALI
-      = new UnicodeBlock('\u0980', '\u09FF',
-                         "BENGALI");
+      = new UnicodeBlock(0x0980, 0x09FF,
+                         "BENGALI", 
+                         "Bengali");
 
     /**
      * Gurmukhi.
-     * '\u0A00' - '\u0A7F'.
+     * 0x0A00 - 0x0A7F.
      */
     public static final UnicodeBlock GURMUKHI
-      = new UnicodeBlock('\u0A00', '\u0A7F',
-                         "GURMUKHI");
+      = new UnicodeBlock(0x0A00, 0x0A7F,
+                         "GURMUKHI", 
+                         "Gurmukhi");
 
     /**
      * Gujarati.
-     * '\u0A80' - '\u0AFF'.
+     * 0x0A80 - 0x0AFF.
      */
     public static final UnicodeBlock GUJARATI
-      = new UnicodeBlock('\u0A80', '\u0AFF',
-                         "GUJARATI");
+      = new UnicodeBlock(0x0A80, 0x0AFF,
+                         "GUJARATI", 
+                         "Gujarati");
 
     /**
      * Oriya.
-     * '\u0B00' - '\u0B7F'.
+     * 0x0B00 - 0x0B7F.
      */
     public static final UnicodeBlock ORIYA
-      = new UnicodeBlock('\u0B00', '\u0B7F',
-                         "ORIYA");
+      = new UnicodeBlock(0x0B00, 0x0B7F,
+                         "ORIYA", 
+                         "Oriya");
 
     /**
      * Tamil.
-     * '\u0B80' - '\u0BFF'.
+     * 0x0B80 - 0x0BFF.
      */
     public static final UnicodeBlock TAMIL
-      = new UnicodeBlock('\u0B80', '\u0BFF',
-                         "TAMIL");
+      = new UnicodeBlock(0x0B80, 0x0BFF,
+                         "TAMIL", 
+                         "Tamil");
 
     /**
      * Telugu.
-     * '\u0C00' - '\u0C7F'.
+     * 0x0C00 - 0x0C7F.
      */
     public static final UnicodeBlock TELUGU
-      = new UnicodeBlock('\u0C00', '\u0C7F',
-                         "TELUGU");
+      = new UnicodeBlock(0x0C00, 0x0C7F,
+                         "TELUGU", 
+                         "Telugu");
 
     /**
      * Kannada.
-     * '\u0C80' - '\u0CFF'.
+     * 0x0C80 - 0x0CFF.
      */
     public static final UnicodeBlock KANNADA
-      = new UnicodeBlock('\u0C80', '\u0CFF',
-                         "KANNADA");
+      = new UnicodeBlock(0x0C80, 0x0CFF,
+                         "KANNADA", 
+                         "Kannada");
 
     /**
      * Malayalam.
-     * '\u0D00' - '\u0D7F'.
+     * 0x0D00 - 0x0D7F.
      */
     public static final UnicodeBlock MALAYALAM
-      = new UnicodeBlock('\u0D00', '\u0D7F',
-                         "MALAYALAM");
+      = new UnicodeBlock(0x0D00, 0x0D7F,
+                         "MALAYALAM", 
+                         "Malayalam");
 
     /**
      * Sinhala.
-     * '\u0D80' - '\u0DFF'.
+     * 0x0D80 - 0x0DFF.
      * @since 1.4
      */
     public static final UnicodeBlock SINHALA
-      = new UnicodeBlock('\u0D80', '\u0DFF',
-                         "SINHALA");
+      = new UnicodeBlock(0x0D80, 0x0DFF,
+                         "SINHALA", 
+                         "Sinhala");
 
     /**
      * Thai.
-     * '\u0E00' - '\u0E7F'.
+     * 0x0E00 - 0x0E7F.
      */
     public static final UnicodeBlock THAI
-      = new UnicodeBlock('\u0E00', '\u0E7F',
-                         "THAI");
+      = new UnicodeBlock(0x0E00, 0x0E7F,
+                         "THAI", 
+                         "Thai");
 
     /**
      * Lao.
-     * '\u0E80' - '\u0EFF'.
+     * 0x0E80 - 0x0EFF.
      */
     public static final UnicodeBlock LAO
-      = new UnicodeBlock('\u0E80', '\u0EFF',
-                         "LAO");
+      = new UnicodeBlock(0x0E80, 0x0EFF,
+                         "LAO", 
+                         "Lao");
 
     /**
      * Tibetan.
-     * '\u0F00' - '\u0FFF'.
+     * 0x0F00 - 0x0FFF.
      */
     public static final UnicodeBlock TIBETAN
-      = new UnicodeBlock('\u0F00', '\u0FFF',
-                         "TIBETAN");
+      = new UnicodeBlock(0x0F00, 0x0FFF,
+                         "TIBETAN", 
+                         "Tibetan");
 
     /**
      * Myanmar.
-     * '\u1000' - '\u109F'.
+     * 0x1000 - 0x109F.
      * @since 1.4
      */
     public static final UnicodeBlock MYANMAR
-      = new UnicodeBlock('\u1000', '\u109F',
-                         "MYANMAR");
+      = new UnicodeBlock(0x1000, 0x109F,
+                         "MYANMAR", 
+                         "Myanmar");
 
     /**
      * Georgian.
-     * '\u10A0' - '\u10FF'.
+     * 0x10A0 - 0x10FF.
      */
     public static final UnicodeBlock GEORGIAN
-      = new UnicodeBlock('\u10A0', '\u10FF',
-                         "GEORGIAN");
+      = new UnicodeBlock(0x10A0, 0x10FF,
+                         "GEORGIAN", 
+                         "Georgian");
 
     /**
      * Hangul Jamo.
-     * '\u1100' - '\u11FF'.
+     * 0x1100 - 0x11FF.
      */
     public static final UnicodeBlock HANGUL_JAMO
-      = new UnicodeBlock('\u1100', '\u11FF',
-                         "HANGUL_JAMO");
+      = new UnicodeBlock(0x1100, 0x11FF,
+                         "HANGUL_JAMO", 
+                         "Hangul Jamo");
 
     /**
      * Ethiopic.
-     * '\u1200' - '\u137F'.
+     * 0x1200 - 0x137F.
      * @since 1.4
      */
     public static final UnicodeBlock ETHIOPIC
-      = new UnicodeBlock('\u1200', '\u137F',
-                         "ETHIOPIC");
+      = new UnicodeBlock(0x1200, 0x137F,
+                         "ETHIOPIC", 
+                         "Ethiopic");
 
     /**
      * Cherokee.
-     * '\u13A0' - '\u13FF'.
+     * 0x13A0 - 0x13FF.
      * @since 1.4
      */
     public static final UnicodeBlock CHEROKEE
-      = new UnicodeBlock('\u13A0', '\u13FF',
-                         "CHEROKEE");
+      = new UnicodeBlock(0x13A0, 0x13FF,
+                         "CHEROKEE", 
+                         "Cherokee");
 
     /**
      * Unified Canadian Aboriginal Syllabics.
-     * '\u1400' - '\u167F'.
+     * 0x1400 - 0x167F.
      * @since 1.4
      */
     public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS
-      = new UnicodeBlock('\u1400', '\u167F',
-                         "UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS");
+      = new UnicodeBlock(0x1400, 0x167F,
+                         "UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS", 
+                         "Unified Canadian Aboriginal Syllabics");
 
     /**
      * Ogham.
-     * '\u1680' - '\u169F'.
+     * 0x1680 - 0x169F.
      * @since 1.4
      */
     public static final UnicodeBlock OGHAM
-      = new UnicodeBlock('\u1680', '\u169F',
-                         "OGHAM");
+      = new UnicodeBlock(0x1680, 0x169F,
+                         "OGHAM", 
+                         "Ogham");
 
     /**
      * Runic.
-     * '\u16A0' - '\u16FF'.
+     * 0x16A0 - 0x16FF.
      * @since 1.4
      */
     public static final UnicodeBlock RUNIC
-      = new UnicodeBlock('\u16A0', '\u16FF',
-                         "RUNIC");
+      = new UnicodeBlock(0x16A0, 0x16FF,
+                         "RUNIC", 
+                         "Runic");
 
     /**
+     * Tagalog.
+     * 0x1700 - 0x171F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock TAGALOG
+      = new UnicodeBlock(0x1700, 0x171F,
+                         "TAGALOG", 
+                         "Tagalog");
+
+    /**
+     * Hanunoo.
+     * 0x1720 - 0x173F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock HANUNOO
+      = new UnicodeBlock(0x1720, 0x173F,
+                         "HANUNOO", 
+                         "Hanunoo");
+
+    /**
+     * Buhid.
+     * 0x1740 - 0x175F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock BUHID
+      = new UnicodeBlock(0x1740, 0x175F,
+                         "BUHID", 
+                         "Buhid");
+
+    /**
+     * Tagbanwa.
+     * 0x1760 - 0x177F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock TAGBANWA
+      = new UnicodeBlock(0x1760, 0x177F,
+                         "TAGBANWA", 
+                         "Tagbanwa");
+
+    /**
      * Khmer.
-     * '\u1780' - '\u17FF'.
+     * 0x1780 - 0x17FF.
      * @since 1.4
      */
     public static final UnicodeBlock KHMER
-      = new UnicodeBlock('\u1780', '\u17FF',
-                         "KHMER");
+      = new UnicodeBlock(0x1780, 0x17FF,
+                         "KHMER", 
+                         "Khmer");
 
     /**
      * Mongolian.
-     * '\u1800' - '\u18AF'.
+     * 0x1800 - 0x18AF.
      * @since 1.4
      */
     public static final UnicodeBlock MONGOLIAN
-      = new UnicodeBlock('\u1800', '\u18AF',
-                         "MONGOLIAN");
+      = new UnicodeBlock(0x1800, 0x18AF,
+                         "MONGOLIAN", 
+                         "Mongolian");
 
     /**
+     * Limbu.
+     * 0x1900 - 0x194F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock LIMBU
+      = new UnicodeBlock(0x1900, 0x194F,
+                         "LIMBU", 
+                         "Limbu");
+
+    /**
+     * Tai Le.
+     * 0x1950 - 0x197F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock TAI_LE
+      = new UnicodeBlock(0x1950, 0x197F,
+                         "TAI_LE", 
+                         "Tai Le");
+
+    /**
+     * Khmer Symbols.
+     * 0x19E0 - 0x19FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock KHMER_SYMBOLS
+      = new UnicodeBlock(0x19E0, 0x19FF,
+                         "KHMER_SYMBOLS", 
+                         "Khmer Symbols");
+
+    /**
+     * Phonetic Extensions.
+     * 0x1D00 - 0x1D7F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock PHONETIC_EXTENSIONS
+      = new UnicodeBlock(0x1D00, 0x1D7F,
+                         "PHONETIC_EXTENSIONS", 
+                         "Phonetic Extensions");
+
+    /**
      * Latin Extended Additional.
-     * '\u1E00' - '\u1EFF'.
+     * 0x1E00 - 0x1EFF.
      */
     public static final UnicodeBlock LATIN_EXTENDED_ADDITIONAL
-      = new UnicodeBlock('\u1E00', '\u1EFF',
-                         "LATIN_EXTENDED_ADDITIONAL");
+      = new UnicodeBlock(0x1E00, 0x1EFF,
+                         "LATIN_EXTENDED_ADDITIONAL", 
+                         "Latin Extended Additional");
 
     /**
      * Greek Extended.
-     * '\u1F00' - '\u1FFF'.
+     * 0x1F00 - 0x1FFF.
      */
     public static final UnicodeBlock GREEK_EXTENDED
-      = new UnicodeBlock('\u1F00', '\u1FFF',
-                         "GREEK_EXTENDED");
+      = new UnicodeBlock(0x1F00, 0x1FFF,
+                         "GREEK_EXTENDED", 
+                         "Greek Extended");
 
     /**
      * General Punctuation.
-     * '\u2000' - '\u206F'.
+     * 0x2000 - 0x206F.
      */
     public static final UnicodeBlock GENERAL_PUNCTUATION
-      = new UnicodeBlock('\u2000', '\u206F',
-                         "GENERAL_PUNCTUATION");
+      = new UnicodeBlock(0x2000, 0x206F,
+                         "GENERAL_PUNCTUATION", 
+                         "General Punctuation");
 
     /**
      * Superscripts and Subscripts.
-     * '\u2070' - '\u209F'.
+     * 0x2070 - 0x209F.
      */
     public static final UnicodeBlock SUPERSCRIPTS_AND_SUBSCRIPTS
-      = new UnicodeBlock('\u2070', '\u209F',
-                         "SUPERSCRIPTS_AND_SUBSCRIPTS");
+      = new UnicodeBlock(0x2070, 0x209F,
+                         "SUPERSCRIPTS_AND_SUBSCRIPTS", 
+                         "Superscripts and Subscripts");
 
     /**
      * Currency Symbols.
-     * '\u20A0' - '\u20CF'.
+     * 0x20A0 - 0x20CF.
      */
     public static final UnicodeBlock CURRENCY_SYMBOLS
-      = new UnicodeBlock('\u20A0', '\u20CF',
-                         "CURRENCY_SYMBOLS");
+      = new UnicodeBlock(0x20A0, 0x20CF,
+                         "CURRENCY_SYMBOLS", 
+                         "Currency Symbols");
 
     /**
      * Combining Marks for Symbols.
-     * '\u20D0' - '\u20FF'.
+     * 0x20D0 - 0x20FF.
      */
     public static final UnicodeBlock COMBINING_MARKS_FOR_SYMBOLS
-      = new UnicodeBlock('\u20D0', '\u20FF',
-                         "COMBINING_MARKS_FOR_SYMBOLS");
+      = new UnicodeBlock(0x20D0, 0x20FF,
+                         "COMBINING_MARKS_FOR_SYMBOLS", 
+                         "Combining Marks for Symbols");
 
     /**
      * Letterlike Symbols.
-     * '\u2100' - '\u214F'.
+     * 0x2100 - 0x214F.
      */
     public static final UnicodeBlock LETTERLIKE_SYMBOLS
-      = new UnicodeBlock('\u2100', '\u214F',
-                         "LETTERLIKE_SYMBOLS");
+      = new UnicodeBlock(0x2100, 0x214F,
+                         "LETTERLIKE_SYMBOLS", 
+                         "Letterlike Symbols");
 
     /**
      * Number Forms.
-     * '\u2150' - '\u218F'.
+     * 0x2150 - 0x218F.
      */
     public static final UnicodeBlock NUMBER_FORMS
-      = new UnicodeBlock('\u2150', '\u218F',
-                         "NUMBER_FORMS");
+      = new UnicodeBlock(0x2150, 0x218F,
+                         "NUMBER_FORMS", 
+                         "Number Forms");
 
     /**
      * Arrows.
-     * '\u2190' - '\u21FF'.
+     * 0x2190 - 0x21FF.
      */
     public static final UnicodeBlock ARROWS
-      = new UnicodeBlock('\u2190', '\u21FF',
-                         "ARROWS");
+      = new UnicodeBlock(0x2190, 0x21FF,
+                         "ARROWS", 
+                         "Arrows");
 
     /**
      * Mathematical Operators.
-     * '\u2200' - '\u22FF'.
+     * 0x2200 - 0x22FF.
      */
     public static final UnicodeBlock MATHEMATICAL_OPERATORS
-      = new UnicodeBlock('\u2200', '\u22FF',
-                         "MATHEMATICAL_OPERATORS");
+      = new UnicodeBlock(0x2200, 0x22FF,
+                         "MATHEMATICAL_OPERATORS", 
+                         "Mathematical Operators");
 
     /**
      * Miscellaneous Technical.
-     * '\u2300' - '\u23FF'.
+     * 0x2300 - 0x23FF.
      */
     public static final UnicodeBlock MISCELLANEOUS_TECHNICAL
-      = new UnicodeBlock('\u2300', '\u23FF',
-                         "MISCELLANEOUS_TECHNICAL");
+      = new UnicodeBlock(0x2300, 0x23FF,
+                         "MISCELLANEOUS_TECHNICAL", 
+                         "Miscellaneous Technical");
 
     /**
      * Control Pictures.
-     * '\u2400' - '\u243F'.
+     * 0x2400 - 0x243F.
      */
     public static final UnicodeBlock CONTROL_PICTURES
-      = new UnicodeBlock('\u2400', '\u243F',
-                         "CONTROL_PICTURES");
+      = new UnicodeBlock(0x2400, 0x243F,
+                         "CONTROL_PICTURES", 
+                         "Control Pictures");
 
     /**
      * Optical Character Recognition.
-     * '\u2440' - '\u245F'.
+     * 0x2440 - 0x245F.
      */
     public static final UnicodeBlock OPTICAL_CHARACTER_RECOGNITION
-      = new UnicodeBlock('\u2440', '\u245F',
-                         "OPTICAL_CHARACTER_RECOGNITION");
+      = new UnicodeBlock(0x2440, 0x245F,
+                         "OPTICAL_CHARACTER_RECOGNITION", 
+                         "Optical Character Recognition");
 
     /**
      * Enclosed Alphanumerics.
-     * '\u2460' - '\u24FF'.
+     * 0x2460 - 0x24FF.
      */
     public static final UnicodeBlock ENCLOSED_ALPHANUMERICS
-      = new UnicodeBlock('\u2460', '\u24FF',
-                         "ENCLOSED_ALPHANUMERICS");
+      = new UnicodeBlock(0x2460, 0x24FF,
+                         "ENCLOSED_ALPHANUMERICS", 
+                         "Enclosed Alphanumerics");
 
     /**
      * Box Drawing.
-     * '\u2500' - '\u257F'.
+     * 0x2500 - 0x257F.
      */
     public static final UnicodeBlock BOX_DRAWING
-      = new UnicodeBlock('\u2500', '\u257F',
-                         "BOX_DRAWING");
+      = new UnicodeBlock(0x2500, 0x257F,
+                         "BOX_DRAWING", 
+                         "Box Drawing");
 
     /**
      * Block Elements.
-     * '\u2580' - '\u259F'.
+     * 0x2580 - 0x259F.
      */
     public static final UnicodeBlock BLOCK_ELEMENTS
-      = new UnicodeBlock('\u2580', '\u259F',
-                         "BLOCK_ELEMENTS");
+      = new UnicodeBlock(0x2580, 0x259F,
+                         "BLOCK_ELEMENTS", 
+                         "Block Elements");
 
     /**
      * Geometric Shapes.
-     * '\u25A0' - '\u25FF'.
+     * 0x25A0 - 0x25FF.
      */
     public static final UnicodeBlock GEOMETRIC_SHAPES
-      = new UnicodeBlock('\u25A0', '\u25FF',
-                         "GEOMETRIC_SHAPES");
+      = new UnicodeBlock(0x25A0, 0x25FF,
+                         "GEOMETRIC_SHAPES", 
+                         "Geometric Shapes");
 
     /**
      * Miscellaneous Symbols.
-     * '\u2600' - '\u26FF'.
+     * 0x2600 - 0x26FF.
      */
     public static final UnicodeBlock MISCELLANEOUS_SYMBOLS
-      = new UnicodeBlock('\u2600', '\u26FF',
-                         "MISCELLANEOUS_SYMBOLS");
+      = new UnicodeBlock(0x2600, 0x26FF,
+                         "MISCELLANEOUS_SYMBOLS", 
+                         "Miscellaneous Symbols");
 
     /**
      * Dingbats.
-     * '\u2700' - '\u27BF'.
+     * 0x2700 - 0x27BF.
      */
     public static final UnicodeBlock DINGBATS
-      = new UnicodeBlock('\u2700', '\u27BF',
-                         "DINGBATS");
+      = new UnicodeBlock(0x2700, 0x27BF,
+                         "DINGBATS", 
+                         "Dingbats");
 
     /**
+     * Miscellaneous Mathematical Symbols-A.
+     * 0x27C0 - 0x27EF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A
+      = new UnicodeBlock(0x27C0, 0x27EF,
+                         "MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A", 
+                         "Miscellaneous Mathematical Symbols-A");
+
+    /**
+     * Supplemental Arrows-A.
+     * 0x27F0 - 0x27FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock SUPPLEMENTAL_ARROWS_A
+      = new UnicodeBlock(0x27F0, 0x27FF,
+                         "SUPPLEMENTAL_ARROWS_A", 
+                         "Supplemental Arrows-A");
+
+    /**
      * Braille Patterns.
-     * '\u2800' - '\u28FF'.
+     * 0x2800 - 0x28FF.
      * @since 1.4
      */
     public static final UnicodeBlock BRAILLE_PATTERNS
-      = new UnicodeBlock('\u2800', '\u28FF',
-                         "BRAILLE_PATTERNS");
+      = new UnicodeBlock(0x2800, 0x28FF,
+                         "BRAILLE_PATTERNS", 
+                         "Braille Patterns");
 
     /**
+     * Supplemental Arrows-B.
+     * 0x2900 - 0x297F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock SUPPLEMENTAL_ARROWS_B
+      = new UnicodeBlock(0x2900, 0x297F,
+                         "SUPPLEMENTAL_ARROWS_B", 
+                         "Supplemental Arrows-B");
+
+    /**
+     * Miscellaneous Mathematical Symbols-B.
+     * 0x2980 - 0x29FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B
+      = new UnicodeBlock(0x2980, 0x29FF,
+                         "MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B", 
+                         "Miscellaneous Mathematical Symbols-B");
+
+    /**
+     * Supplemental Mathematical Operators.
+     * 0x2A00 - 0x2AFF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock SUPPLEMENTAL_MATHEMATICAL_OPERATORS
+      = new UnicodeBlock(0x2A00, 0x2AFF,
+                         "SUPPLEMENTAL_MATHEMATICAL_OPERATORS", 
+                         "Supplemental Mathematical Operators");
+
+    /**
+     * Miscellaneous Symbols and Arrows.
+     * 0x2B00 - 0x2BFF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_ARROWS
+      = new UnicodeBlock(0x2B00, 0x2BFF,
+                         "MISCELLANEOUS_SYMBOLS_AND_ARROWS", 
+                         "Miscellaneous Symbols and Arrows");
+
+    /**
      * CJK Radicals Supplement.
-     * '\u2E80' - '\u2EFF'.
+     * 0x2E80 - 0x2EFF.
      * @since 1.4
      */
     public static final UnicodeBlock CJK_RADICALS_SUPPLEMENT
-      = new UnicodeBlock('\u2E80', '\u2EFF',
-                         "CJK_RADICALS_SUPPLEMENT");
+      = new UnicodeBlock(0x2E80, 0x2EFF,
+                         "CJK_RADICALS_SUPPLEMENT", 
+                         "CJK Radicals Supplement");
 
     /**
      * Kangxi Radicals.
-     * '\u2F00' - '\u2FDF'.
+     * 0x2F00 - 0x2FDF.
      * @since 1.4
      */
     public static final UnicodeBlock KANGXI_RADICALS
-      = new UnicodeBlock('\u2F00', '\u2FDF',
-                         "KANGXI_RADICALS");
+      = new UnicodeBlock(0x2F00, 0x2FDF,
+                         "KANGXI_RADICALS", 
+                         "Kangxi Radicals");
 
     /**
      * Ideographic Description Characters.
-     * '\u2FF0' - '\u2FFF'.
+     * 0x2FF0 - 0x2FFF.
      * @since 1.4
      */
     public static final UnicodeBlock IDEOGRAPHIC_DESCRIPTION_CHARACTERS
-      = new UnicodeBlock('\u2FF0', '\u2FFF',
-                         "IDEOGRAPHIC_DESCRIPTION_CHARACTERS");
+      = new UnicodeBlock(0x2FF0, 0x2FFF,
+                         "IDEOGRAPHIC_DESCRIPTION_CHARACTERS", 
+                         "Ideographic Description Characters");
 
     /**
      * CJK Symbols and Punctuation.
-     * '\u3000' - '\u303F'.
+     * 0x3000 - 0x303F.
      */
     public static final UnicodeBlock CJK_SYMBOLS_AND_PUNCTUATION
-      = new UnicodeBlock('\u3000', '\u303F',
-                         "CJK_SYMBOLS_AND_PUNCTUATION");
+      = new UnicodeBlock(0x3000, 0x303F,
+                         "CJK_SYMBOLS_AND_PUNCTUATION", 
+                         "CJK Symbols and Punctuation");
 
     /**
      * Hiragana.
-     * '\u3040' - '\u309F'.
+     * 0x3040 - 0x309F.
      */
     public static final UnicodeBlock HIRAGANA
-      = new UnicodeBlock('\u3040', '\u309F',
-                         "HIRAGANA");
+      = new UnicodeBlock(0x3040, 0x309F,
+                         "HIRAGANA", 
+                         "Hiragana");
 
     /**
      * Katakana.
-     * '\u30A0' - '\u30FF'.
+     * 0x30A0 - 0x30FF.
      */
     public static final UnicodeBlock KATAKANA
-      = new UnicodeBlock('\u30A0', '\u30FF',
-                         "KATAKANA");
+      = new UnicodeBlock(0x30A0, 0x30FF,
+                         "KATAKANA", 
+                         "Katakana");
 
     /**
      * Bopomofo.
-     * '\u3100' - '\u312F'.
+     * 0x3100 - 0x312F.
      */
     public static final UnicodeBlock BOPOMOFO
-      = new UnicodeBlock('\u3100', '\u312F',
-                         "BOPOMOFO");
+      = new UnicodeBlock(0x3100, 0x312F,
+                         "BOPOMOFO", 
+                         "Bopomofo");
 
     /**
      * Hangul Compatibility Jamo.
-     * '\u3130' - '\u318F'.
+     * 0x3130 - 0x318F.
      */
     public static final UnicodeBlock HANGUL_COMPATIBILITY_JAMO
-      = new UnicodeBlock('\u3130', '\u318F',
-                         "HANGUL_COMPATIBILITY_JAMO");
+      = new UnicodeBlock(0x3130, 0x318F,
+                         "HANGUL_COMPATIBILITY_JAMO", 
+                         "Hangul Compatibility Jamo");
 
     /**
      * Kanbun.
-     * '\u3190' - '\u319F'.
+     * 0x3190 - 0x319F.
      */
     public static final UnicodeBlock KANBUN
-      = new UnicodeBlock('\u3190', '\u319F',
-                         "KANBUN");
+      = new UnicodeBlock(0x3190, 0x319F,
+                         "KANBUN", 
+                         "Kanbun");
 
     /**
      * Bopomofo Extended.
-     * '\u31A0' - '\u31BF'.
+     * 0x31A0 - 0x31BF.
      * @since 1.4
      */
     public static final UnicodeBlock BOPOMOFO_EXTENDED
-      = new UnicodeBlock('\u31A0', '\u31BF',
-                         "BOPOMOFO_EXTENDED");
+      = new UnicodeBlock(0x31A0, 0x31BF,
+                         "BOPOMOFO_EXTENDED", 
+                         "Bopomofo Extended");
 
     /**
+     * Katakana Phonetic Extensions.
+     * 0x31F0 - 0x31FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock KATAKANA_PHONETIC_EXTENSIONS
+      = new UnicodeBlock(0x31F0, 0x31FF,
+                         "KATAKANA_PHONETIC_EXTENSIONS", 
+                         "Katakana Phonetic Extensions");
+
+    /**
      * Enclosed CJK Letters and Months.
-     * '\u3200' - '\u32FF'.
+     * 0x3200 - 0x32FF.
      */
     public static final UnicodeBlock ENCLOSED_CJK_LETTERS_AND_MONTHS
-      = new UnicodeBlock('\u3200', '\u32FF',
-                         "ENCLOSED_CJK_LETTERS_AND_MONTHS");
+      = new UnicodeBlock(0x3200, 0x32FF,
+                         "ENCLOSED_CJK_LETTERS_AND_MONTHS", 
+                         "Enclosed CJK Letters and Months");
 
     /**
      * CJK Compatibility.
-     * '\u3300' - '\u33FF'.
+     * 0x3300 - 0x33FF.
      */
     public static final UnicodeBlock CJK_COMPATIBILITY
-      = new UnicodeBlock('\u3300', '\u33FF',
-                         "CJK_COMPATIBILITY");
+      = new UnicodeBlock(0x3300, 0x33FF,
+                         "CJK_COMPATIBILITY", 
+                         "CJK Compatibility");
 
     /**
      * CJK Unified Ideographs Extension A.
-     * '\u3400' - '\u4DB5'.
+     * 0x3400 - 0x4DBF.
      * @since 1.4
      */
     public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
-      = new UnicodeBlock('\u3400', '\u4DB5',
-                         "CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A");
+      = new UnicodeBlock(0x3400, 0x4DBF,
+                         "CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A", 
+                         "CJK Unified Ideographs Extension A");
 
     /**
+     * Yijing Hexagram Symbols.
+     * 0x4DC0 - 0x4DFF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock YIJING_HEXAGRAM_SYMBOLS
+      = new UnicodeBlock(0x4DC0, 0x4DFF,
+                         "YIJING_HEXAGRAM_SYMBOLS", 
+                         "Yijing Hexagram Symbols");
+
+    /**
      * CJK Unified Ideographs.
-     * '\u4E00' - '\u9FFF'.
+     * 0x4E00 - 0x9FFF.
      */
     public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS
-      = new UnicodeBlock('\u4E00', '\u9FFF',
-                         "CJK_UNIFIED_IDEOGRAPHS");
+      = new UnicodeBlock(0x4E00, 0x9FFF,
+                         "CJK_UNIFIED_IDEOGRAPHS", 
+                         "CJK Unified Ideographs");
 
     /**
      * Yi Syllables.
-     * '\uA000' - '\uA48F'.
+     * 0xA000 - 0xA48F.
      * @since 1.4
      */
     public static final UnicodeBlock YI_SYLLABLES
-      = new UnicodeBlock('\uA000', '\uA48F',
-                         "YI_SYLLABLES");
+      = new UnicodeBlock(0xA000, 0xA48F,
+                         "YI_SYLLABLES", 
+                         "Yi Syllables");
 
     /**
      * Yi Radicals.
-     * '\uA490' - '\uA4CF'.
+     * 0xA490 - 0xA4CF.
      * @since 1.4
      */
     public static final UnicodeBlock YI_RADICALS
-      = new UnicodeBlock('\uA490', '\uA4CF',
-                         "YI_RADICALS");
+      = new UnicodeBlock(0xA490, 0xA4CF,
+                         "YI_RADICALS", 
+                         "Yi Radicals");
 
     /**
      * Hangul Syllables.
-     * '\uAC00' - '\uD7A3'.
+     * 0xAC00 - 0xD7AF.
      */
     public static final UnicodeBlock HANGUL_SYLLABLES
-      = new UnicodeBlock('\uAC00', '\uD7A3',
-                         "HANGUL_SYLLABLES");
+      = new UnicodeBlock(0xAC00, 0xD7AF,
+                         "HANGUL_SYLLABLES", 
+                         "Hangul Syllables");
 
     /**
-     * Surrogates Area.
-     * '\uD800' - '\uDFFF'.
+     * High Surrogates.
+     * 0xD800 - 0xDB7F.
+     * @since 1.5
      */
-    public static final UnicodeBlock SURROGATES_AREA
-      = new UnicodeBlock('\uD800', '\uDFFF',
-                         "SURROGATES_AREA");
+    public static final UnicodeBlock HIGH_SURROGATES
+      = new UnicodeBlock(0xD800, 0xDB7F,
+                         "HIGH_SURROGATES", 
+                         "High Surrogates");
 
     /**
+     * High Private Use Surrogates.
+     * 0xDB80 - 0xDBFF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock HIGH_PRIVATE_USE_SURROGATES
+      = new UnicodeBlock(0xDB80, 0xDBFF,
+                         "HIGH_PRIVATE_USE_SURROGATES", 
+                         "High Private Use Surrogates");
+
+    /**
+     * Low Surrogates.
+     * 0xDC00 - 0xDFFF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock LOW_SURROGATES
+      = new UnicodeBlock(0xDC00, 0xDFFF,
+                         "LOW_SURROGATES", 
+                         "Low Surrogates");
+
+    /**
      * Private Use Area.
-     * '\uE000' - '\uF8FF'.
+     * 0xE000 - 0xF8FF.
      */
     public static final UnicodeBlock PRIVATE_USE_AREA
-      = new UnicodeBlock('\uE000', '\uF8FF',
-                         "PRIVATE_USE_AREA");
+      = new UnicodeBlock(0xE000, 0xF8FF,
+                         "PRIVATE_USE_AREA", 
+                         "Private Use Area");
 
     /**
      * CJK Compatibility Ideographs.
-     * '\uF900' - '\uFAFF'.
+     * 0xF900 - 0xFAFF.
      */
     public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS
-      = new UnicodeBlock('\uF900', '\uFAFF',
-                         "CJK_COMPATIBILITY_IDEOGRAPHS");
+      = new UnicodeBlock(0xF900, 0xFAFF,
+                         "CJK_COMPATIBILITY_IDEOGRAPHS", 
+                         "CJK Compatibility Ideographs");
 
     /**
      * Alphabetic Presentation Forms.
-     * '\uFB00' - '\uFB4F'.
+     * 0xFB00 - 0xFB4F.
      */
     public static final UnicodeBlock ALPHABETIC_PRESENTATION_FORMS
-      = new UnicodeBlock('\uFB00', '\uFB4F',
-                         "ALPHABETIC_PRESENTATION_FORMS");
+      = new UnicodeBlock(0xFB00, 0xFB4F,
+                         "ALPHABETIC_PRESENTATION_FORMS", 
+                         "Alphabetic Presentation Forms");
 
     /**
      * Arabic Presentation Forms-A.
-     * '\uFB50' - '\uFDFF'.
+     * 0xFB50 - 0xFDFF.
      */
     public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_A
-      = new UnicodeBlock('\uFB50', '\uFDFF',
-                         "ARABIC_PRESENTATION_FORMS_A");
+      = new UnicodeBlock(0xFB50, 0xFDFF,
+                         "ARABIC_PRESENTATION_FORMS_A", 
+                         "Arabic Presentation Forms-A");
 
     /**
+     * Variation Selectors.
+     * 0xFE00 - 0xFE0F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock VARIATION_SELECTORS
+      = new UnicodeBlock(0xFE00, 0xFE0F,
+                         "VARIATION_SELECTORS", 
+                         "Variation Selectors");
+
+    /**
      * Combining Half Marks.
-     * '\uFE20' - '\uFE2F'.
+     * 0xFE20 - 0xFE2F.
      */
     public static final UnicodeBlock COMBINING_HALF_MARKS
-      = new UnicodeBlock('\uFE20', '\uFE2F',
-                         "COMBINING_HALF_MARKS");
+      = new UnicodeBlock(0xFE20, 0xFE2F,
+                         "COMBINING_HALF_MARKS", 
+                         "Combining Half Marks");
 
     /**
      * CJK Compatibility Forms.
-     * '\uFE30' - '\uFE4F'.
+     * 0xFE30 - 0xFE4F.
      */
     public static final UnicodeBlock CJK_COMPATIBILITY_FORMS
-      = new UnicodeBlock('\uFE30', '\uFE4F',
-                         "CJK_COMPATIBILITY_FORMS");
+      = new UnicodeBlock(0xFE30, 0xFE4F,
+                         "CJK_COMPATIBILITY_FORMS", 
+                         "CJK Compatibility Forms");
 
     /**
      * Small Form Variants.
-     * '\uFE50' - '\uFE6F'.
+     * 0xFE50 - 0xFE6F.
      */
     public static final UnicodeBlock SMALL_FORM_VARIANTS
-      = new UnicodeBlock('\uFE50', '\uFE6F',
-                         "SMALL_FORM_VARIANTS");
+      = new UnicodeBlock(0xFE50, 0xFE6F,
+                         "SMALL_FORM_VARIANTS", 
+                         "Small Form Variants");
 
     /**
      * Arabic Presentation Forms-B.
-     * '\uFE70' - '\uFEFE'.
+     * 0xFE70 - 0xFEFF.
      */
     public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_B
-      = new UnicodeBlock('\uFE70', '\uFEFE',
-                         "ARABIC_PRESENTATION_FORMS_B");
+      = new UnicodeBlock(0xFE70, 0xFEFF,
+                         "ARABIC_PRESENTATION_FORMS_B", 
+                         "Arabic Presentation Forms-B");
 
     /**
      * Halfwidth and Fullwidth Forms.
-     * '\uFF00' - '\uFFEF'.
+     * 0xFF00 - 0xFFEF.
      */
     public static final UnicodeBlock HALFWIDTH_AND_FULLWIDTH_FORMS
-      = new UnicodeBlock('\uFF00', '\uFFEF',
-                         "HALFWIDTH_AND_FULLWIDTH_FORMS");
+      = new UnicodeBlock(0xFF00, 0xFFEF,
+                         "HALFWIDTH_AND_FULLWIDTH_FORMS", 
+                         "Halfwidth and Fullwidth Forms");
 
     /**
      * Specials.
-     * '\uFEFF', '\uFFF0' - '\uFFFD'.
+     * 0xFFF0 - 0xFFFF.
      */
     public static final UnicodeBlock SPECIALS
-      = new UnicodeBlock('\uFFF0', '\uFFFD',
-                         "SPECIALS");
+      = new UnicodeBlock(0xFFF0, 0xFFFF,
+                         "SPECIALS", 
+                         "Specials");
 
     /**
+     * Linear B Syllabary.
+     * 0x10000 - 0x1007F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock LINEAR_B_SYLLABARY
+      = new UnicodeBlock(0x10000, 0x1007F,
+                         "LINEAR_B_SYLLABARY", 
+                         "Linear B Syllabary");
+
+    /**
+     * Linear B Ideograms.
+     * 0x10080 - 0x100FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock LINEAR_B_IDEOGRAMS
+      = new UnicodeBlock(0x10080, 0x100FF,
+                         "LINEAR_B_IDEOGRAMS", 
+                         "Linear B Ideograms");
+
+    /**
+     * Aegean Numbers.
+     * 0x10100 - 0x1013F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock AEGEAN_NUMBERS
+      = new UnicodeBlock(0x10100, 0x1013F,
+                         "AEGEAN_NUMBERS", 
+                         "Aegean Numbers");
+
+    /**
+     * Old Italic.
+     * 0x10300 - 0x1032F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock OLD_ITALIC
+      = new UnicodeBlock(0x10300, 0x1032F,
+                         "OLD_ITALIC", 
+                         "Old Italic");
+
+    /**
+     * Gothic.
+     * 0x10330 - 0x1034F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock GOTHIC
+      = new UnicodeBlock(0x10330, 0x1034F,
+                         "GOTHIC", 
+                         "Gothic");
+
+    /**
+     * Ugaritic.
+     * 0x10380 - 0x1039F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock UGARITIC
+      = new UnicodeBlock(0x10380, 0x1039F,
+                         "UGARITIC", 
+                         "Ugaritic");
+
+    /**
+     * Deseret.
+     * 0x10400 - 0x1044F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock DESERET
+      = new UnicodeBlock(0x10400, 0x1044F,
+                         "DESERET", 
+                         "Deseret");
+
+    /**
+     * Shavian.
+     * 0x10450 - 0x1047F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock SHAVIAN
+      = new UnicodeBlock(0x10450, 0x1047F,
+                         "SHAVIAN", 
+                         "Shavian");
+
+    /**
+     * Osmanya.
+     * 0x10480 - 0x104AF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock OSMANYA
+      = new UnicodeBlock(0x10480, 0x104AF,
+                         "OSMANYA", 
+                         "Osmanya");
+
+    /**
+     * Cypriot Syllabary.
+     * 0x10800 - 0x1083F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock CYPRIOT_SYLLABARY
+      = new UnicodeBlock(0x10800, 0x1083F,
+                         "CYPRIOT_SYLLABARY", 
+                         "Cypriot Syllabary");
+
+    /**
+     * Byzantine Musical Symbols.
+     * 0x1D000 - 0x1D0FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock BYZANTINE_MUSICAL_SYMBOLS
+      = new UnicodeBlock(0x1D000, 0x1D0FF,
+                         "BYZANTINE_MUSICAL_SYMBOLS", 
+                         "Byzantine Musical Symbols");
+
+    /**
+     * Musical Symbols.
+     * 0x1D100 - 0x1D1FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock MUSICAL_SYMBOLS
+      = new UnicodeBlock(0x1D100, 0x1D1FF,
+                         "MUSICAL_SYMBOLS", 
+                         "Musical Symbols");
+
+    /**
+     * Tai Xuan Jing Symbols.
+     * 0x1D300 - 0x1D35F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock TAI_XUAN_JING_SYMBOLS
+      = new UnicodeBlock(0x1D300, 0x1D35F,
+                         "TAI_XUAN_JING_SYMBOLS", 
+                         "Tai Xuan Jing Symbols");
+
+    /**
+     * Mathematical Alphanumeric Symbols.
+     * 0x1D400 - 0x1D7FF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock MATHEMATICAL_ALPHANUMERIC_SYMBOLS
+      = new UnicodeBlock(0x1D400, 0x1D7FF,
+                         "MATHEMATICAL_ALPHANUMERIC_SYMBOLS", 
+                         "Mathematical Alphanumeric Symbols");
+
+    /**
+     * CJK Unified Ideographs Extension B.
+     * 0x20000 - 0x2A6DF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
+      = new UnicodeBlock(0x20000, 0x2A6DF,
+                         "CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B", 
+                         "CJK Unified Ideographs Extension B");
+
+    /**
+     * CJK Compatibility Ideographs Supplement.
+     * 0x2F800 - 0x2FA1F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT
+      = new UnicodeBlock(0x2F800, 0x2FA1F,
+                         "CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT", 
+                         "CJK Compatibility Ideographs Supplement");
+
+    /**
+     * Tags.
+     * 0xE0000 - 0xE007F.
+     * @since 1.5
+     */
+    public static final UnicodeBlock TAGS
+      = new UnicodeBlock(0xE0000, 0xE007F,
+                         "TAGS", 
+                         "Tags");
+
+    /**
+     * Variation Selectors Supplement.
+     * 0xE0100 - 0xE01EF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock VARIATION_SELECTORS_SUPPLEMENT
+      = new UnicodeBlock(0xE0100, 0xE01EF,
+                         "VARIATION_SELECTORS_SUPPLEMENT", 
+                         "Variation Selectors Supplement");
+
+    /**
+     * Supplementary Private Use Area-A.
+     * 0xF0000 - 0xFFFFF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A
+      = new UnicodeBlock(0xF0000, 0xFFFFF,
+                         "SUPPLEMENTARY_PRIVATE_USE_AREA_A", 
+                         "Supplementary Private Use Area-A");
+
+    /**
+     * Supplementary Private Use Area-B.
+     * 0x100000 - 0x10FFFF.
+     * @since 1.5
+     */
+    public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_B
+      = new UnicodeBlock(0x100000, 0x10FFFF,
+                         "SUPPLEMENTARY_PRIVATE_USE_AREA_B", 
+                         "Supplementary Private Use Area-B");
+
+    /**
+     * Surrogates Area.
+     * 'D800' - 'DFFF'.
+     * @deprecated As of 1.5, the three areas, 
+     * <a href="#HIGH_SURROGATES">HIGH_SURROGATES</a>,
+     * <a href="#HIGH_PRIVATE_USE_SURROGATES">HIGH_PRIVATE_USE_SURROGATES</a>
+     * and <a href="#LOW_SURROGATES">LOW_SURROGATES</a>, as defined
+     * by the Unicode standard, should be used in preference to
+     * this.  These are also returned from calls to <code>of(int)</code>
+     * and <code>of(char)</code>.
+     */
+    public static final UnicodeBlock SURROGATES_AREA
+      = new UnicodeBlock(0xD800, 0xDFFF,
+                         "SURROGATES_AREA",
+             "Surrogates Area");
+
+    /**
      * The defined subsets.
      */
     private static final UnicodeBlock sets[] = {
@@ -909,6 +1535,7 @@
       COMBINING_DIACRITICAL_MARKS,
       GREEK,
       CYRILLIC,
+      CYRILLIC_SUPPLEMENTARY,
       ARMENIAN,
       HEBREW,
       ARABIC,
@@ -935,8 +1562,16 @@
       UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS,
       OGHAM,
       RUNIC,
+      TAGALOG,
+      HANUNOO,
+      BUHID,
+      TAGBANWA,
       KHMER,
       MONGOLIAN,
+      LIMBU,
+      TAI_LE,
+      KHMER_SYMBOLS,
+      PHONETIC_EXTENSIONS,
       LATIN_EXTENDED_ADDITIONAL,
       GREEK_EXTENDED,
       GENERAL_PUNCTUATION,
@@ -956,7 +1591,13 @@
       GEOMETRIC_SHAPES,
       MISCELLANEOUS_SYMBOLS,
       DINGBATS,
+      MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A,
+      SUPPLEMENTAL_ARROWS_A,
       BRAILLE_PATTERNS,
+      SUPPLEMENTAL_ARROWS_B,
+      MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B,
+      SUPPLEMENTAL_MATHEMATICAL_OPERATORS,
+      MISCELLANEOUS_SYMBOLS_AND_ARROWS,
       CJK_RADICALS_SUPPLEMENT,
       KANGXI_RADICALS,
       IDEOGRAPHIC_DESCRIPTION_CHARACTERS,
@@ -967,24 +1608,49 @@
       HANGUL_COMPATIBILITY_JAMO,
       KANBUN,
       BOPOMOFO_EXTENDED,
+      KATAKANA_PHONETIC_EXTENSIONS,
       ENCLOSED_CJK_LETTERS_AND_MONTHS,
       CJK_COMPATIBILITY,
       CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A,
+      YIJING_HEXAGRAM_SYMBOLS,
       CJK_UNIFIED_IDEOGRAPHS,
       YI_SYLLABLES,
       YI_RADICALS,
       HANGUL_SYLLABLES,
-      SURROGATES_AREA,
+      HIGH_SURROGATES,
+      HIGH_PRIVATE_USE_SURROGATES,
+      LOW_SURROGATES,
       PRIVATE_USE_AREA,
       CJK_COMPATIBILITY_IDEOGRAPHS,
       ALPHABETIC_PRESENTATION_FORMS,
       ARABIC_PRESENTATION_FORMS_A,
+      VARIATION_SELECTORS,
       COMBINING_HALF_MARKS,
       CJK_COMPATIBILITY_FORMS,
       SMALL_FORM_VARIANTS,
       ARABIC_PRESENTATION_FORMS_B,
       HALFWIDTH_AND_FULLWIDTH_FORMS,
       SPECIALS,
+      LINEAR_B_SYLLABARY,
+      LINEAR_B_IDEOGRAMS,
+      AEGEAN_NUMBERS,
+      OLD_ITALIC,
+      GOTHIC,
+      UGARITIC,
+      DESERET,
+      SHAVIAN,
+      OSMANYA,
+      CYPRIOT_SYLLABARY,
+      BYZANTINE_MUSICAL_SYMBOLS,
+      MUSICAL_SYMBOLS,
+      TAI_XUAN_JING_SYMBOLS,
+      MATHEMATICAL_ALPHANUMERIC_SYMBOLS,
+      CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B,
+      CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT,
+      TAGS,
+      VARIATION_SELECTORS_SUPPLEMENT,
+      SUPPLEMENTARY_PRIVATE_USE_AREA_A,
+      SUPPLEMENTARY_PRIVATE_USE_AREA_B,
     };
   } // class UnicodeBlock
 
Index: classpath/gnu/regexp/CharIndexedStringBuffer.java
===================================================================
--- classpath/gnu/regexp/CharIndexedStringBuffer.java	(revision 110832)
+++ classpath/gnu/regexp/CharIndexedStringBuffer.java	(working copy)
@@ -1,5 +1,5 @@
 /* gnu/regexp/CharIndexedStringBuffer.java
-   Copyright (C) 1998-2001, 2004 Free Software Foundation, Inc.
+   Copyright (C) 1998-2001, 2004, 2006 Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -59,4 +59,13 @@
   public boolean move(int index) {
     return ((anchor += index) < s.length());
   }
+
+  public CharIndexed lookBehind(int index, int length) {
+    if (length > (anchor + index)) length = anchor + index;
+    return new CharIndexedStringBuffer(s, anchor + index - length);
+  }
+
+  public int length() {
+    return s.length() - anchor;
+  }
 }
Index: classpath/gnu/regexp/RETokenChar.java
===================================================================
--- classpath/gnu/regexp/RETokenChar.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenChar.java	(working copy)
@@ -52,6 +52,10 @@
     return ch.length;
   }
   
+  int getMaximumLength() {
+    return ch.length;
+  }
+  
     boolean match(CharIndexed input, REMatch mymatch) {
 	int z = ch.length;
 	char c;
@@ -68,7 +72,7 @@
 
   // Overrides REToken.chain() to optimize for strings
   boolean chain(REToken next) {
-    if (next instanceof RETokenChar) {
+    if (next instanceof RETokenChar && ((RETokenChar)next).insens == insens) {
       RETokenChar cnext = (RETokenChar) next;
       // assume for now that next can only be one character
       int newsize = ch.length + cnext.ch.length;
Index: classpath/gnu/regexp/CharIndexedString.java
===================================================================
--- classpath/gnu/regexp/CharIndexedString.java	(revision 110832)
+++ classpath/gnu/regexp/CharIndexedString.java	(working copy)
@@ -1,5 +1,5 @@
 /* gnu/regexp/CharIndexedString.java
-   Copyright (C) 1998-2001, 2004 Free Software Foundation, Inc.
+   Copyright (C) 1998-2001, 2004, 2006 Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -61,4 +61,13 @@
     public boolean move(int index) {
 	return ((anchor += index) < len);
     }
+
+    public CharIndexed lookBehind(int index, int length) {
+	if (length > (anchor + index)) length = anchor + index;
+	return new CharIndexedString(s, anchor + index - length);
+    }
+
+    public int length() {
+	return len - anchor;
+    }
 }
Index: classpath/gnu/regexp/RETokenLookBehind.java
===================================================================
--- classpath/gnu/regexp/RETokenLookBehind.java	(revision 0)
+++ classpath/gnu/regexp/RETokenLookBehind.java	(revision 0)
@@ -0,0 +1,116 @@
+/* gnu/regexp/RETokenLookBehind.java
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is part of GNU Classpath.
+
+GNU Classpath is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GNU Classpath is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Classpath; see the file COPYING.  If not, write to the
+Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA.
+
+Linking this library statically or dynamically with other modules is
+making a combined work based on this library.  Thus, the terms and
+conditions of the GNU General Public License cover the whole
+combination.
+
+As a special exception, the copyright holders of this library give you
+permission to link this library with independent modules to produce an
+executable, regardless of the license terms of these independent
+modules, and to copy and distribute the resulting executable under
+terms of your choice, provided that you also meet, for each linked
+independent module, the terms and conditions of the license of that
+module.  An independent module is a module which is not derived from
+or based on this library.  If you modify this library, you may extend
+this exception to your version of the library, but you are not
+obligated to do so.  If you do not wish to do so, delete this
+exception statement from your version. */
+
+package gnu.regexp;
+
+/**
+ * @author Ito Kazumitsu
+ */
+final class RETokenLookBehind extends REToken
+{
+  REToken re;
+  boolean negative;
+
+  RETokenLookBehind(REToken re, boolean negative) throws REException {
+    super(0);
+    this.re = re;
+    this.negative = negative;
+  }
+
+  int getMaximumLength() {
+    return 0;
+  }
+
+  boolean match(CharIndexed input, REMatch mymatch)
+  {
+    int max = re.getMaximumLength();
+    CharIndexed behind = input.lookBehind(mymatch.index, max);
+    REMatch trymatch = (REMatch)mymatch.clone();
+    REMatch trymatch1 = (REMatch)mymatch.clone();
+    REMatch newMatch = null;
+    int curIndex = trymatch.index + behind.length() - input.length();
+    trymatch.index = 0;
+    RETokenMatchHereOnly stopper = new RETokenMatchHereOnly(curIndex);
+    REToken re1 = (REToken) re.clone();
+    re1.chain(stopper);
+    if (re1.match(behind, trymatch)) {
+      if (negative) return false;
+      if (next(input, trymatch1))
+        newMatch = trymatch1;
+    }
+
+    if (newMatch != null) {
+      if (negative) return false;
+      //else
+      mymatch.assignFrom(newMatch);
+      return true;
+    }
+    else { // no match
+      if (negative)
+        return next(input, mymatch);
+      //else
+      return false;
+    }
+  }
+
+    void dump(StringBuffer os) {
+	os.append("(?<");
+	os.append(negative ? '!' : '=');
+	re.dumpAll(os);
+	os.append(')');
+    }
+
+    private static class RETokenMatchHereOnly extends REToken {
+
+        int getMaximumLength() { return 0; }
+
+	private int index;
+
+	RETokenMatchHereOnly(int index) {
+	    super(0);
+	    this.index = index;
+	}
+
+	boolean match(CharIndexed input, REMatch mymatch) {
+	    return index == mymatch.index;
+	}
+
+        void dump(StringBuffer os) {}
+
+    }
+}
+
Index: classpath/gnu/regexp/RE.java
===================================================================
--- classpath/gnu/regexp/RE.java	(revision 110832)
+++ classpath/gnu/regexp/RE.java	(working copy)
@@ -136,12 +136,13 @@
 
     /** Minimum length, in characters, of any possible match. */
     private int minimumLength;
+    private int maximumLength;
 
   /**
    * Compilation flag. Do  not  differentiate  case.   Subsequent
    * searches  using  this  RE will be case insensitive.
    */
-  public static final int REG_ICASE = 2;
+  public static final int REG_ICASE = 0x02;
 
   /**
    * Compilation flag. The match-any-character operator (dot)
@@ -149,14 +150,14 @@
    * bit RE_DOT_NEWLINE (see RESyntax for details).  This is equivalent to
    * the "/s" operator in Perl.
    */
-  public static final int REG_DOT_NEWLINE = 4;
+  public static final int REG_DOT_NEWLINE = 0x04;
 
   /**
    * Compilation flag. Use multiline mode.  In this mode, the ^ and $
    * anchors will match based on newlines within the input. This is
    * equivalent to the "/m" operator in Perl.
    */
-  public static final int REG_MULTILINE = 8;
+  public static final int REG_MULTILINE = 0x08;
 
   /**
    * Execution flag.
@@ -185,14 +186,14 @@
    * //  m4.toString(): "fool"<BR>
    * </CODE>
    */
-  public static final int REG_NOTBOL = 16;
+  public static final int REG_NOTBOL = 0x10;
 
   /**
    * Execution flag.
    * The match-end operator ($) does not match at the end
    * of the input string. Useful for matching on substrings.
    */
-  public static final int REG_NOTEOL = 32;
+  public static final int REG_NOTEOL = 0x20;
 
   /**
    * Execution flag.
@@ -206,7 +207,7 @@
    * the example under REG_NOTBOL.  It also affects the use of the \&lt;
    * and \b operators.
    */
-  public static final int REG_ANCHORINDEX = 64;
+  public static final int REG_ANCHORINDEX = 0x40;
 
   /**
    * Execution flag.
@@ -215,8 +216,25 @@
    * the corresponding subexpressions.  For example, you may want to
    * replace all matches of "one dollar" with "$1".
    */
-  public static final int REG_NO_INTERPOLATE = 128;
+  public static final int REG_NO_INTERPOLATE = 0x80;
 
+  /**
+   * Execution flag.
+   * Try to match the whole input string. An implicit match-end operator
+   * is added to this regexp.
+   */
+  public static final int REG_TRY_ENTIRE_MATCH = 0x0100;
+
+  /**
+   * Execution flag.
+   * The substitute and substituteAll methods will treat the
+   * character '\' in the replacement as an escape to a literal
+   * character. In this case "\n", "\$", "\\", "\x40" and "\012"
+   * will become "n", "$", "\", "x40" and "012" respectively.
+   * This flag has no effect if REG_NO_INTERPOLATE is set on.
+   */
+  public static final int REG_REPLACE_USE_BACKSLASHESCAPE = 0x0200;
+
   /** Returns a string representing the version of the gnu.regexp package. */
   public static final String version() {
     return VERSION;
@@ -273,12 +291,13 @@
   }
 
   // internal constructor used for alternation
-  private RE(REToken first, REToken last,int subs, int subIndex, int minLength) {
+  private RE(REToken first, REToken last,int subs, int subIndex, int minLength, int maxLength) {
     super(subIndex);
     firstToken = first;
     lastToken = last;
     numSubs = subs;
     minimumLength = minLength;
+    maximumLength = maxLength;
     addToken(new RETokenEndSub(subIndex));
   }
 
@@ -333,6 +352,11 @@
     char ch;
     boolean quot = false;
 
+    // Saved syntax and flags.
+    RESyntax savedSyntax = null;
+    int savedCflags = 0;
+    boolean flagsSaved = false;
+
     while (index < pLength) {
       // read the next character unit (including backslash escapes)
       index = getCharUnit(pattern,index,unit,quot);
@@ -359,8 +383,9 @@
 	   && !syntax.get(RESyntax.RE_LIMITED_OPS)) {
 	// make everything up to here be a branch. create vector if nec.
 	addToken(currentToken);
-	RE theBranch = new RE(firstToken, lastToken, numSubs, subIndex, minimumLength);
+	RE theBranch = new RE(firstToken, lastToken, numSubs, subIndex, minimumLength, maximumLength);
 	minimumLength = 0;
+	maximumLength = 0;
 	if (branches == null) {
 	    branches = new Vector();
 	}
@@ -374,6 +399,9 @@
       //
       // OPEN QUESTION: 
       //  what is proper interpretation of '{' at start of string?
+      //
+      // This method used to check "repeat.empty.token" to avoid such regexp
+      // as "(a*){2,}", but now "repeat.empty.token" is allowed.
 
       else if ((unit.ch == '{') && syntax.get(RESyntax.RE_INTERVALS) && (syntax.get(RESyntax.RE_NO_BK_BRACES) ^ (unit.bk || quot))) {
 	int newIndex = getMinMax(pattern,index,minMax,syntax);
@@ -386,8 +414,6 @@
             throw new REException(getLocalizedMessage("repeat.chained"),REException.REG_BADRPT,newIndex);
           if (currentToken instanceof RETokenWordBoundary || currentToken instanceof RETokenWordBoundary)
             throw new REException(getLocalizedMessage("repeat.assertion"),REException.REG_BADRPT,newIndex);
-          if ((currentToken.getMinimumLength() == 0) && (minMax.second == Integer.MAX_VALUE))
-            throw new REException(getLocalizedMessage("repeat.empty.token"),REException.REG_BADRPT,newIndex);
           index = newIndex;
           currentToken = setRepeated(currentToken,minMax.first,minMax.second,index); 
         }
@@ -403,6 +429,8 @@
       else if ((unit.ch == '[') && !(unit.bk || quot)) {
 	Vector options = new Vector();
 	boolean negative = false;
+	// FIXME: lastChar == 0 means lastChar is not set. But what if
+	// \u0000 is used as a meaningful character?
 	char lastChar = 0;
 	if (index == pLength) throw new REException(getLocalizedMessage("unmatched.bracket"),REException.REG_EBRACK,index);
 	
@@ -426,6 +454,13 @@
 	      options.addElement(new RETokenChar(subIndex,lastChar,insens));
 	      lastChar = '-';
 	    } else {
+	      if ((ch == '\\') && syntax.get(RESyntax.RE_BACKSLASH_ESCAPE_IN_LISTS)) {
+	        CharExpression ce = getCharExpression(pattern, index, pLength, syntax);
+	        if (ce == null)
+		  throw new REException("invalid escape sequence", REException.REG_ESCAPE, index);
+		ch = ce.ch;
+		index = index + ce.len - 1;
+	      }
 	      options.addElement(new RETokenRange(subIndex,lastChar,ch,insens));
 	      lastChar = 0;
 	      index++;
@@ -434,7 +469,10 @@
             if (index == pLength) throw new REException(getLocalizedMessage("class.no.end"),REException.REG_EBRACK,index);
 	    int posixID = -1;
 	    boolean negate = false;
+	    // FIXME: asciiEsc == 0 means asciiEsc is not set. But what if
+	    // \u0000 is used as a meaningful character?
             char asciiEsc = 0;
+	    NamedProperty np = null;
 	    if (("dswDSW".indexOf(pattern[index]) != -1) && syntax.get(RESyntax.RE_CHAR_CLASS_ESC_IN_LISTS)) {
 	      switch (pattern[index]) {
 	      case 'D':
@@ -454,23 +492,25 @@
 		break;
 	      }
 	    }
-            else if ("nrt".indexOf(pattern[index]) != -1) {
-              switch (pattern[index]) {
-                case 'n':
-                  asciiEsc = '\n';
-                  break;
-                case 't':
-                  asciiEsc = '\t';
-                  break;
-                case 'r':
-                  asciiEsc = '\r';
-                  break;
-              }
-            }
+	    if (("pP".indexOf(pattern[index]) != -1) && syntax.get(RESyntax.RE_NAMED_PROPERTY)) {
+	      np = getNamedProperty(pattern, index - 1, pLength);
+	      if (np == null)
+		throw new REException("invalid escape sequence", REException.REG_ESCAPE, index);
+	      index = index - 1 + np.len - 1;
+	    }
+	    else {
+	      CharExpression ce = getCharExpression(pattern, index - 1, pLength, syntax);
+	      if (ce == null)
+		throw new REException("invalid escape sequence", REException.REG_ESCAPE, index);
+	      asciiEsc = ce.ch;
+	      index = index - 1 + ce.len - 1;
+	    }
 	    if (lastChar != 0) options.addElement(new RETokenChar(subIndex,lastChar,insens));
 	    
 	    if (posixID != -1) {
 	      options.addElement(new RETokenPOSIX(subIndex,posixID,insens,negate));
+	    } else if (np != null) {
+	      options.addElement(getRETokenNamedProperty(subIndex,np,insens,index));
 	    } else if (asciiEsc != 0) {
 	      lastChar = asciiEsc;
 	    } else {
@@ -506,7 +546,10 @@
 	boolean pure = false;
 	boolean comment = false;
         boolean lookAhead = false;
+        boolean lookBehind = false;
+        boolean independent = false;
         boolean negativelh = false;
+        boolean negativelb = false;
 	if ((index+1 < pLength) && (pattern[index] == '?')) {
 	  switch (pattern[index+1]) {
           case '!':
@@ -524,6 +567,114 @@
               index += 2;
             }
             break;
+	  case '<':
+	    // We assume that if the syntax supports look-ahead,
+	    // it also supports look-behind.
+	    if (syntax.get(RESyntax.RE_LOOKAHEAD)) {
+		index++;
+		switch (pattern[index +1]) {
+		case '!':
+		  pure = true;
+		  negativelb = true;
+		  lookBehind = true;
+		  index += 2;
+		  break;
+		case '=':
+		  pure = true;
+		  lookBehind = true;
+		  index += 2;
+		}
+	    }
+	    break;
+	  case '>':
+	    // We assume that if the syntax supports look-ahead,
+	    // it also supports independent group.
+            if (syntax.get(RESyntax.RE_LOOKAHEAD)) {
+              pure = true;
+              independent = true;
+              index += 2;
+            }
+            break;
+	  case 'i':
+	  case 'd':
+	  case 'm':
+	  case 's':
+	  // case 'u':  not supported
+	  // case 'x':  not supported
+	  case '-':
+            if (!syntax.get(RESyntax.RE_EMBEDDED_FLAGS)) break;
+	    // Set or reset syntax flags.
+	    int flagIndex = index + 1;
+	    int endFlag = -1;
+	    RESyntax newSyntax = new RESyntax(syntax);
+	    int newCflags = cflags;
+	    boolean negate = false;
+	    while (flagIndex < pLength && endFlag < 0) {
+	        switch(pattern[flagIndex]) {
+	  	case 'i':
+		  if (negate)
+		    newCflags &= ~REG_ICASE;
+		  else
+		    newCflags |= REG_ICASE;
+		  flagIndex++;
+		  break;
+	  	case 'd':
+		  if (negate)
+		    newSyntax.setLineSeparator(RESyntax.DEFAULT_LINE_SEPARATOR);
+		  else
+		    newSyntax.setLineSeparator("\n");
+		  flagIndex++;
+		  break;
+	  	case 'm':
+		  if (negate)
+		    newCflags &= ~REG_MULTILINE;
+		  else
+		    newCflags |= REG_MULTILINE;
+		  flagIndex++;
+		  break;
+	  	case 's':
+		  if (negate)
+		    newCflags &= ~REG_DOT_NEWLINE;
+		  else
+		    newCflags |= REG_DOT_NEWLINE;
+		  flagIndex++;
+		  break;
+	  	// case 'u': not supported
+	  	// case 'x': not supported
+	  	case '-':
+		  negate = true;
+		  flagIndex++;
+		  break;
+		case ':':
+		case ')':
+		  endFlag = pattern[flagIndex];
+		  break;
+		default:
+            	  throw new REException(getLocalizedMessage("repeat.no.token"), REException.REG_BADRPT, index);
+		}
+	    }
+	    if (endFlag == ')') {
+		syntax = newSyntax;
+		cflags = newCflags;
+		insens = ((cflags & REG_ICASE) > 0);
+		// This can be treated as though it were a comment.
+		comment = true;
+		index = flagIndex - 1;
+		break;
+	    }
+	    if (endFlag == ':') {
+		savedSyntax = syntax;
+		savedCflags = cflags;
+		flagsSaved = true;
+		syntax = newSyntax;
+		cflags = newCflags;
+		insens = ((cflags & REG_ICASE) > 0);
+		index = flagIndex -1;
+		// Fall through to the next case.
+	    }
+	    else {
+	        throw new REException(getLocalizedMessage("unmatched.paren"), REException.REG_ESUBREG,index);
+	    }
 	  case ':':
 	    if (syntax.get(RESyntax.RE_PURE_GROUPING)) {
 	      pure = true;
@@ -550,13 +701,50 @@
 	int nested = 0;
 
 	while ( ((nextIndex = getCharUnit(pattern,endIndex,unit,false)) > 0)
-		&& !(nested == 0 && (unit.ch == ')') && (syntax.get(RESyntax.RE_NO_BK_PARENS) ^ (unit.bk || quot))) )
+		&& !(nested == 0 && (unit.ch == ')') && (syntax.get(RESyntax.RE_NO_BK_PARENS) ^ (unit.bk || quot))) ) {
 	  if ((endIndex = nextIndex) >= pLength)
 	    throw new REException(getLocalizedMessage("subexpr.no.end"),REException.REG_ESUBREG,nextIndex);
+	  else if ((unit.ch == '[') && !(unit.bk || quot)) {
+	    // I hate to do something similar to the LIST OPERATOR matters
+	    // above, but ...
+	    int listIndex = nextIndex;
+	    if (listIndex < pLength && pattern[listIndex] == '^') listIndex++;
+	    if (listIndex < pLength && pattern[listIndex] == ']') listIndex++;
+	    int listEndIndex = -1;
+	    int listNest = 0;
+	    while (listIndex < pLength && listEndIndex < 0) {
+	      switch(pattern[listIndex++]) {
+		case '\\':
+		  listIndex++;
+		  break;
+		case '[':
+		  // Sun's API document says that regexp like "[a-d[m-p]]"
+		  // is legal. Even something like "[[[^]]]]" is accepted.
+		  listNest++;
+		  if (listIndex < pLength && pattern[listIndex] == '^') listIndex++;
+		  if (listIndex < pLength && pattern[listIndex] == ']') listIndex++;
+		  break;
+		case ']':
+		  if (listNest == 0)
+		    listEndIndex = listIndex;
+		  listNest--;
+		  break;
+	      }
+	    }
+	    if (listEndIndex >= 0) {
+	      nextIndex = listEndIndex;
+	      if ((endIndex = nextIndex) >= pLength)
+	        throw new REException(getLocalizedMessage("subexpr.no.end"),REException.REG_ESUBREG,nextIndex);
+	      else
+	        continue;
+	    }
+	    throw new REException(getLocalizedMessage("subexpr.no.end"),REException.REG_ESUBREG,nextIndex);
+	  }
 	  else if (unit.ch == '(' && (syntax.get(RESyntax.RE_NO_BK_PARENS) ^ (unit.bk || quot)))
 	    nested++;
 	  else if (unit.ch == ')' && (syntax.get(RESyntax.RE_NO_BK_PARENS) ^ (unit.bk || quot)))
 	    nested--;
+	}
 
 	// endIndex is now position at a ')','\)' 
 	// nextIndex is end of string or position after ')' or '\)'
@@ -569,15 +757,28 @@
 	    numSubs++;
 	  }
 
-	  int useIndex = (pure || lookAhead) ? 0 : nextSub + numSubs;
+	  int useIndex = (pure || lookAhead || lookBehind || independent) ?
+			 0 : nextSub + numSubs;
 	  currentToken = new RE(String.valueOf(pattern,index,endIndex-index).toCharArray(),cflags,syntax,useIndex,nextSub + numSubs);
 	  numSubs += ((RE) currentToken).getNumSubs();
 
           if (lookAhead) {
 	      currentToken = new RETokenLookAhead(currentToken,negativelh);
 	  }
+          else if (lookBehind) {
+	      currentToken = new RETokenLookBehind(currentToken,negativelb);
+	  }
+          else if (independent) {
+	      currentToken = new RETokenIndependent(currentToken);
+	  }
 
 	  index = nextIndex;
+	  if (flagsSaved) {
+	      syntax = savedSyntax;
+	      cflags = savedCflags;
+	      insens = ((cflags & REG_ICASE) > 0);
+	      flagsSaved = false;
+	  }
 	} // not a comment
       } // subexpression
     
@@ -616,6 +817,9 @@
 
       // ZERO-OR-MORE REPEAT OPERATOR
       //  *
+      //
+      // This method used to check "repeat.empty.token" to avoid such regexp
+      // as "(a*)*", but now "repeat.empty.token" is allowed.
 
       else if ((unit.ch == '*') && !(unit.bk || quot)) {
 	if (currentToken == null)
@@ -624,14 +828,15 @@
           throw new REException(getLocalizedMessage("repeat.chained"),REException.REG_BADRPT,index);
 	if (currentToken instanceof RETokenWordBoundary || currentToken instanceof RETokenWordBoundary)
 	  throw new REException(getLocalizedMessage("repeat.assertion"),REException.REG_BADRPT,index);
-	if (currentToken.getMinimumLength() == 0)
-	  throw new REException(getLocalizedMessage("repeat.empty.token"),REException.REG_BADRPT,index);
 	currentToken = setRepeated(currentToken,0,Integer.MAX_VALUE,index);
       }
 
       // ONE-OR-MORE REPEAT OPERATOR / POSSESSIVE MATCHING OPERATOR
       //  + | \+ depending on RE_BK_PLUS_QM
       //  not available if RE_LIMITED_OPS is set
+      //
+      // This method used to check "repeat.empty.token" to avoid such regexp
+      // as "(a*)+", but now "repeat.empty.token" is allowed.
 
       else if ((unit.ch == '+') && !syntax.get(RESyntax.RE_LIMITED_OPS) && (!syntax.get(RESyntax.RE_BK_PLUS_QM) ^ (unit.bk || quot))) {
 	if (currentToken == null)
@@ -648,8 +853,6 @@
 	}
 	else if (currentToken instanceof RETokenWordBoundary || currentToken instanceof RETokenWordBoundary)
 	  throw new REException(getLocalizedMessage("repeat.assertion"),REException.REG_BADRPT,index);
-	else if (currentToken.getMinimumLength() == 0)
-	  throw new REException(getLocalizedMessage("repeat.empty.token"),REException.REG_BADRPT,index);
 	else
 	  currentToken = setRepeated(currentToken,1,Integer.MAX_VALUE,index);
       }
@@ -675,14 +878,45 @@
 	else
 	  currentToken = setRepeated(currentToken,0,1,index);
       }
+
+      // OCTAL CHARACTER
+      //  \0377
 	
+      else if (unit.bk && (unit.ch == '0') && syntax.get(RESyntax.RE_OCTAL_CHAR)) {
+	CharExpression ce = getCharExpression(pattern, index - 2, pLength, syntax);
+	if (ce == null)
+	  throw new REException("invalid octal character", REException.REG_ESCAPE, index);
+	index = index - 2 + ce.len;
+	addToken(currentToken);
+	currentToken = new RETokenChar(subIndex,ce.ch,insens);
+      }
+
       // BACKREFERENCE OPERATOR
-      //  \1 \2 ... \9
+      //  \1 \2 ... \9 and \10 \11 \12 ...
       // not available if RE_NO_BK_REFS is set
+      // Perl recognizes \10, \11, and so on only if enough number of
+      // parentheses have opened before it, otherwise they are treated
+      // as aliases of \010, \011, ... (octal characters).  In case of
+      // Sun's JDK, octal character expression must always begin with \0.
+      // We will do as JDK does. But FIXME, take a look at "(a)(b)\29".
+      // JDK treats \2 as a back reference to the 2nd group because
+      // there are only two groups. But in our poor implementation,
+      // we cannot help but treat \29 as a back reference to the 29th group.
 
       else if (unit.bk && Character.isDigit(unit.ch) && !syntax.get(RESyntax.RE_NO_BK_REFS)) {
 	addToken(currentToken);
-	currentToken = new RETokenBackRef(subIndex,Character.digit(unit.ch,10),insens);
+	int numBegin = index - 1;
+	int numEnd = pLength;
+	for (int i = index; i < pLength; i++) {
+	    if (! Character.isDigit(pattern[i])) {
+		numEnd = i;
+		break;
+	    }
+	}
+	int num = parseInt(pattern, numBegin, numEnd-numBegin, 10);
+
+	currentToken = new RETokenBackRef(subIndex,num,insens);
+	index = numEnd;
       }
 
       // START OF STRING OPERATOR
@@ -804,6 +1038,32 @@
 	  currentToken = new RETokenEnd(subIndex,null);
 	}
 
+        // HEX CHARACTER, UNICODE CHARACTER
+        //  \x1B, \u1234
+	
+	else if ((unit.bk && (unit.ch == 'x') && syntax.get(RESyntax.RE_HEX_CHAR)) ||
+		 (unit.bk && (unit.ch == 'u') && syntax.get(RESyntax.RE_UNICODE_CHAR))) {
+	  CharExpression ce = getCharExpression(pattern, index - 2, pLength, syntax);
+	  if (ce == null)
+	    throw new REException("invalid hex character", REException.REG_ESCAPE, index);
+	  index = index - 2 + ce.len;
+	  addToken(currentToken);
+	  currentToken = new RETokenChar(subIndex,ce.ch,insens);
+	}
+
+	// NAMED PROPERTY
+	// \p{prop}, \P{prop}
+
+	else if ((unit.bk && (unit.ch == 'p') && syntax.get(RESyntax.RE_NAMED_PROPERTY)) ||
+	         (unit.bk && (unit.ch == 'P') && syntax.get(RESyntax.RE_NAMED_PROPERTY))) {
+	  NamedProperty np = getNamedProperty(pattern, index - 2, pLength);
+	  if (np == null)
+	      throw new REException("invalid escape sequence", REException.REG_ESCAPE, index);
+	  index = index - 2 + np.len;
+	  addToken(currentToken);
+	  currentToken = getRETokenNamedProperty(subIndex,np,insens,index);
+	}
+
 	// NON-SPECIAL CHARACTER (or escape to make literal)
         //  c | \* for example
 
@@ -817,9 +1077,10 @@
     addToken(currentToken);
       
     if (branches != null) {
-	branches.addElement(new RE(firstToken,lastToken,numSubs,subIndex,minimumLength));
+	branches.addElement(new RE(firstToken,lastToken,numSubs,subIndex,minimumLength, maximumLength));
 	branches.trimToSize(); // compact the Vector
 	minimumLength = 0;
+	maximumLength = 0;
 	firstToken = lastToken = null;
 	addToken(new RETokenOneOf(subIndex,branches,false));
     } 
@@ -838,7 +1099,177 @@
     return index;
   }
 
+  private static int parseInt(char[] input, int pos, int len, int radix) {
+    int ret = 0;
+    for (int i = pos; i < pos + len; i++) {
+	ret = ret * radix + Character.digit(input[i], radix);
+    }
+    return ret;
+  }
+
   /**
+   * This class represents various expressions for a character.
+   * "a"      : 'a' itself.
+   * "\0123"  : Octal char 0123
+   * "\x1b"   : Hex char 0x1b
+   * "\u1234" : Unicode char \u1234
+   */
+  private static class CharExpression {
+    /** character represented by this expression */
+    char ch;
+    /** String expression */
+    String expr;
+    /** length of this expression */
+    int len;
+    public String toString() { return expr; }
+  }
+
+  private CharExpression getCharExpression(char[] input, int pos, int lim,
+        RESyntax syntax) {
+    CharExpression ce = new CharExpression();
+    char c = input[pos];
+    if (c == '\\') {
+      if (pos + 1 >= lim) return null;
+      c = input[pos + 1];
+      switch(c) {
+      case 't':
+        ce.ch = '\t';
+        ce.len = 2;
+        break;
+      case 'n':
+        ce.ch = '\n';
+        ce.len = 2;
+        break;
+      case 'r':
+        ce.ch = '\r';
+        ce.len = 2;
+        break;
+      case 'x':
+      case 'u':
+        if ((c == 'x' && syntax.get(RESyntax.RE_HEX_CHAR)) ||
+            (c == 'u' && syntax.get(RESyntax.RE_UNICODE_CHAR))) {
+          int l = 0;
+          int expectedLength = (c == 'x' ? 2 : 4);
+          for (int i = pos + 2; i < pos + 2 + expectedLength; i++) {
+            if (i >= lim) break;
+            if (!((input[i] >= '0' && input[i] <= '9') ||
+                  (input[i] >= 'A' && input[i] <= 'F') ||
+                  (input[i] >= 'a' && input[i] <= 'f')))
+                break;
+	    l++;
+          }
+          if (l != expectedLength) return null;
+          ce.ch = (char)(parseInt(input, pos + 2, l, 16));
+	  ce.len = l + 2;
+        }
+        else {
+          ce.ch = c;
+          ce.len = 2;
+        }
+        break;
+      case '0':
+        if (syntax.get(RESyntax.RE_OCTAL_CHAR)) {
+          int l = 0;
+          for (int i = pos + 2; i < pos + 2 + 3; i++) {
+            if (i >= lim) break;
+	    if (input[i] < '0' || input[i] > '7') break;
+            l++;
+          }
+          if (l == 3 && input[pos + 2] > '3') l--;
+          if (l <= 0) return null;
+          ce.ch = (char)(parseInt(input, pos + 2, l, 8));
+          ce.len = l + 2;
+        }
+        else {
+          ce.ch = c;
+          ce.len = 2;
+        }
+        break;
+      default:
+        ce.ch = c;
+        ce.len = 2;
+        break;
+      }
+    }
+    else {
+      ce.ch = input[pos];
+      ce.len = 1;
+    }
+    ce.expr = new String(input, pos, ce.len);
+    return ce;
+  }
+
+  /**
+   * This class represents a substring in a pattern string expressing
+   * a named property.
+   * "\pA"      : Property named "A"
+   * "\p{prop}" : Property named "prop"
+   * "\PA"      : Property named "A" (Negated)
+   * "\P{prop}" : Property named "prop" (Negated)
+   */
+  private static class NamedProperty {
+    /** Property name */
+    String name;
+    /** Negated or not */
+    boolean negate;
+    /** length of this expression */
+    int len;
+  }
+
+  private NamedProperty getNamedProperty(char[] input, int pos, int lim) {
+    NamedProperty np = new NamedProperty();
+    char c = input[pos];
+    if (c == '\\') {
+      if (++pos >= lim) return null;
+      c = input[pos++];
+      switch(c) {
+      case 'p':
+        np.negate = false;
+        break;
+      case 'P':
+        np.negate = true;
+        break;
+      default:
+	return null;
+      }
+      c = input[pos++];
+      if (c == '{') {
+          int p = -1;
+	  for (int i = pos; i < lim; i++) {
+	      if (input[i] == '}') {
+		  p = i;
+		  break;
+	      }
+	  }
+	  if (p < 0) return null;
+	  int len = p - pos;
+          np.name = new String(input, pos, len);
+	  np.len = len + 4;
+      }
+      else {
+          np.name = new String(input, pos - 1, 1);
+	  np.len = 3;
+      }
+      return np;
+    }
+    else return null;
+  }
+
+  private static RETokenNamedProperty getRETokenNamedProperty(
+      int subIndex, NamedProperty np, boolean insens, int index)
+      throws REException {
+    try {
+	return new RETokenNamedProperty(subIndex, np.name, insens, np.negate);
+    }
+    catch (REException e) {
+	REException ree;
+	ree = new REException(e.getMessage(), REException.REG_ESCAPE, index);
+	ree.initCause(e);
+	throw ree;
+    }
+  }
+
+  /**
    * Checks if the regular expression matches the input in its entirety.
    *
    * @param input The input text.
@@ -918,6 +1349,10 @@
       return minimumLength;
   }
 
+  public int getMaximumLength() {
+      return maximumLength;
+  }
+
   /**
    * Returns an array of all matches found in the input.
    *
@@ -985,7 +1420,9 @@
   
     /* Implements abstract method REToken.match() */
     boolean match(CharIndexed input, REMatch mymatch) { 
-	if (firstToken == null) return next(input, mymatch);
+	if (firstToken == null) {
+	    return next(input, mymatch);
+	}
 
 	// Note the start of this subexpression
 	mymatch.start[subIndex] = mymatch.index;
@@ -1049,23 +1486,34 @@
   }
 
   REMatch getMatchImpl(CharIndexed input, int anchor, int eflags, StringBuffer buffer) {
+      boolean tryEntireMatch = ((eflags & REG_TRY_ENTIRE_MATCH) != 0);
+      RE re = (tryEntireMatch ? (RE) this.clone() : this);
+      if (tryEntireMatch) {
+	  re.chain(new RETokenEnd(0, null));
+      }
       // Create a new REMatch to hold results
       REMatch mymatch = new REMatch(numSubs, anchor, eflags);
       do {
 	  // Optimization: check if anchor + minimumLength > length
 	  if (minimumLength == 0 || input.charAt(minimumLength-1) != CharIndexed.OUT_OF_BOUNDS) {
-	      if (match(input, mymatch)) {
-		  // Find longest match of them all to observe leftmost longest
-		  REMatch longest = mymatch;
+	      if (re.match(input, mymatch)) {
+		  REMatch best = mymatch;
+		  // We assume that the match that coms first is the best.
+		  // And the following "The longer, the better" rule has
+		  // been commented out. The longest is not neccesarily
+		  // the best. For example, "a" out of "aaa" is the best
+		  // match for /a+?/.
+		  /*
+		  // Find best match of them all to observe leftmost longest
 		  while ((mymatch = mymatch.next) != null) {
-		      if (mymatch.index > longest.index) {
-			  longest = mymatch;
+		      if (mymatch.index > best.index) {
+		   	best = mymatch;
 		      }
 		  }
-		  
-		  longest.end[0] = longest.index;
-		  longest.finish(input);
-		  return longest;
+		  */
+		  best.end[0] = best.index;
+		  best.finish(input);
+		  return best;
 	      }
 	  }
 	  mymatch.clear(++anchor);
@@ -1176,8 +1624,7 @@
     StringBuffer buffer = new StringBuffer();
     REMatch m = getMatchImpl(input,index,eflags,buffer);
     if (m==null) return buffer.toString();
-    buffer.append( ((eflags & REG_NO_INTERPOLATE) > 0) ?
-		   replace : m.substituteInto(replace) );
+    buffer.append(getReplacement(replace, m, eflags));
     if (input.move(m.end[0])) {
       do {
 	buffer.append(input.charAt(0));
@@ -1238,8 +1685,7 @@
     StringBuffer buffer = new StringBuffer();
     REMatch m;
     while ((m = getMatchImpl(input,index,eflags,buffer)) != null) {
-	buffer.append( ((eflags & REG_NO_INTERPOLATE) > 0) ?
-		       replace : m.substituteInto(replace) );
+      buffer.append(getReplacement(replace, m, eflags));
       index = m.getEndIndex();
       if (m.end[0] == 0) {
 	char ch = input.charAt(0);
@@ -1254,11 +1700,50 @@
     }
     return buffer.toString();
   }
+
+  public static String getReplacement(String replace, REMatch m, int eflags) {
+    if ((eflags & REG_NO_INTERPOLATE) > 0)
+      return replace;
+    else {
+      if ((eflags & REG_REPLACE_USE_BACKSLASHESCAPE) > 0) {
+        StringBuffer sb = new StringBuffer();
+        int l = replace.length();
+        for (int i = 0; i < l; i++) {
+	    char c = replace.charAt(i);
+            switch(c) {
+            case '\\':
+              i++;
+              // Let StringIndexOutOfBoundsException be thrown.
+              sb.append(replace.charAt(i));
+              break;
+            case '$':
+	      int i1 = i + 1;
+	      while (i1 < replace.length() &&
+		Character.isDigit(replace.charAt(i1))) i1++;
+              sb.append(m.substituteInto(replace.substring(i, i1)));
+              i = i1 - 1;
+              break;
+            default:
+              sb.append(c);
+            }
+        }
+        return sb.toString();
+      }
+      else
+        return m.substituteInto(replace);
+    }
+  }	
   
   /* Helper function for constructor */
   private void addToken(REToken next) {
     if (next == null) return;
     minimumLength += next.getMinimumLength();
+    int nmax = next.getMaximumLength();
+    if (nmax < Integer.MAX_VALUE && maximumLength < Integer.MAX_VALUE)
+	maximumLength += nmax;
+    else 
+	maximumLength = Integer.MAX_VALUE;
+
     if (firstToken == null) {
 	lastToken = firstToken = next;
     } else {
Index: classpath/gnu/regexp/REToken.java
===================================================================
--- classpath/gnu/regexp/REToken.java	(revision 110832)
+++ classpath/gnu/regexp/REToken.java	(working copy)
@@ -38,12 +38,21 @@
 package gnu.regexp;
 import java.io.Serializable;
 
-abstract class REToken implements Serializable {
+abstract class REToken implements Serializable, Cloneable {
 
   protected REToken next = null;
   protected REToken uncle = null;
   protected int subIndex;
 
+  public Object clone() {
+    try {
+      REToken copy = (REToken) super.clone();
+      return copy;
+    } catch (CloneNotSupportedException e) {
+      throw new Error(); // doesn't happen
+    }
+  }
+
   protected REToken(int subIndex) {
       this.subIndex = subIndex;
   }
@@ -52,6 +61,10 @@
     return 0;
   }
 
+  int getMaximumLength() {
+    return Integer.MAX_VALUE;
+  }
+
   void setUncle(REToken anUncle) {
     uncle = anUncle;
   }
Index: classpath/gnu/regexp/RETokenWordBoundary.java
===================================================================
--- classpath/gnu/regexp/RETokenWordBoundary.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenWordBoundary.java	(working copy)
@@ -52,6 +52,11 @@
 	this.where = where;
 	this.negated = negated;
     }
+
+    int getMaximumLength() {
+        return 0;
+    }
+
     
     boolean match(CharIndexed input, REMatch mymatch) {
 	// Word boundary means input[index-1] was a word character
Index: classpath/gnu/regexp/RETokenEndSub.java
===================================================================
--- classpath/gnu/regexp/RETokenEndSub.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenEndSub.java	(working copy)
@@ -41,6 +41,10 @@
     RETokenEndSub(int subIndex) {
 	super(subIndex);
     }
+
+    int getMaximumLength() {
+      return 0;
+    }
     
     boolean match(CharIndexed input, REMatch mymatch) {
 	mymatch.end[subIndex] = mymatch.index;
Index: classpath/gnu/regexp/CharIndexedInputStream.java
===================================================================
--- classpath/gnu/regexp/CharIndexedInputStream.java	(revision 110832)
+++ classpath/gnu/regexp/CharIndexedInputStream.java	(working copy)
@@ -1,5 +1,5 @@
 /* gnu/regexp/CharIndexedInputStream.java
-   Copyright (C) 1998-2001, 2004 Free Software Foundation, Inc.
+   Copyright (C) 1998-2001, 2004, 2006 Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -145,5 +145,15 @@
     public boolean isValid() {
 	return (cached != OUT_OF_BOUNDS);
     }
+
+    public CharIndexed lookBehind(int index, int length) {
+	throw new UnsupportedOperationException(
+	    "difficult to look behind for an input stream");
+    }
+
+    public int length() {
+	throw new UnsupportedOperationException(
+	    "difficult to tell the length for an input stream");
+    }
 }
 
Index: classpath/gnu/regexp/CharIndexedCharArray.java
===================================================================
--- classpath/gnu/regexp/CharIndexedCharArray.java	(revision 110832)
+++ classpath/gnu/regexp/CharIndexedCharArray.java	(working copy)
@@ -1,5 +1,5 @@
 /* gnu/regexp/CharIndexedCharArray.java
-   Copyright (C) 1998-2001, 2004 Free Software Foundation, Inc.
+   Copyright (C) 1998-2001, 2004, 2006 Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -59,4 +59,13 @@
     public boolean move(int index) {
 	return ((anchor += index) < s.length);
     }
+    
+    public CharIndexed lookBehind(int index, int length) {
+	if (length > (anchor + index)) length = anchor + index;
+	return new CharIndexedCharArray(s, anchor + index - length);
+    }
+
+    public int length() {
+	return s.length - anchor;
+    }
 }
Index: classpath/gnu/regexp/RESyntax.java
===================================================================
--- classpath/gnu/regexp/RESyntax.java	(revision 110832)
+++ classpath/gnu/regexp/RESyntax.java	(working copy)
@@ -202,9 +202,34 @@
    */
   public static final int RE_POSSESSIVE_OPS            = 25;
 
-  private static final int BIT_TOTAL                   = 26;
+  /**
+   * Syntax bit.  Allow embedded flags, (?is-x), as in Perl5.
+   */
+  public static final int RE_EMBEDDED_FLAGS            = 26;
 
   /**
+   * Syntax bit.  Allow octal char (\0377), as in Perl5.
+   */
+  public static final int RE_OCTAL_CHAR                = 27;
+
+  /**
+   * Syntax bit.  Allow hex char (\x1b), as in Perl5.
+   */
+  public static final int RE_HEX_CHAR                  = 28;
+
+  /**
+   * Syntax bit.  Allow Unicode char (\u1234), as in Java 1.4.
+   */
+  public static final int RE_UNICODE_CHAR              = 29;
+
+  /**
+   * Syntax bit.  Allow named property (\p{P}, \P{p}), as in Perl5.
+   */
+  public static final int RE_NAMED_PROPERTY            = 30;
+
+  private static final int BIT_TOTAL                   = 31;
+
+  /**
    * Predefined syntax.
    * Emulates regular expression support in the awk utility.
    */
@@ -422,6 +447,10 @@
 	  .set(RE_STRING_ANCHORS)         // \A,\Z
 	  .set(RE_CHAR_CLASS_ESC_IN_LISTS)// \d,\D,\w,\W,\s,\S within []
 	  .set(RE_COMMENTS)              // (?#)
+	  .set(RE_EMBEDDED_FLAGS)         // (?imsx-imsx)
+	  .set(RE_OCTAL_CHAR)             // \0377
+	  .set(RE_HEX_CHAR)               // \x1b
+	  .set(RE_NAMED_PROPERTY)         // \p{prop}, \P{prop}
 	  .makeFinal();
       
       RE_SYNTAX_PERL5_S = new RESyntax(RE_SYNTAX_PERL5)
@@ -431,6 +460,7 @@
       RE_SYNTAX_JAVA_1_4 = new RESyntax(RE_SYNTAX_PERL5)
 	  // XXX
 	  .set(RE_POSSESSIVE_OPS)         // *+,?+,++,{}+
+	  .set(RE_UNICODE_CHAR)           // \u1234
 	  .makeFinal();
   }
 
Index: classpath/gnu/regexp/CharIndexed.java
===================================================================
--- classpath/gnu/regexp/CharIndexed.java	(revision 110832)
+++ classpath/gnu/regexp/CharIndexed.java	(working copy)
@@ -1,5 +1,5 @@
 /* gnu/regexp/CharIndexed.java
-   Copyright (C) 1998-2001, 2004 Free Software Foundation, Inc.
+   Copyright (C) 1998-2001, 2004, 2006 Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -81,4 +81,16 @@
      * position at a valid position in the input.
      */
     boolean isValid();
+
+    /**
+     * Returns another CharIndexed containing length characters to the left
+     * of the given index. The given length is an expected maximum and
+     * the returned CharIndexed may not necessarily contain so many characters.
+     */
+    CharIndexed lookBehind(int index, int length);
+
+    /**
+     * Returns the effective length of this CharIndexed
+     */
+    int length();
 }
Index: classpath/gnu/regexp/RETokenAny.java
===================================================================
--- classpath/gnu/regexp/RETokenAny.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenAny.java	(working copy)
@@ -55,6 +55,10 @@
     return 1;
   }
 
+  int getMaximumLength() {
+    return 1;
+  }
+
     boolean match(CharIndexed input, REMatch mymatch) {
     char ch = input.charAt(mymatch.index);
     if ((ch == CharIndexed.OUT_OF_BOUNDS)
Index: classpath/gnu/regexp/RETokenLookAhead.java
===================================================================
--- classpath/gnu/regexp/RETokenLookAhead.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenLookAhead.java	(working copy)
@@ -52,6 +52,10 @@
     this.negative = negative;
   }
 
+  int getMaximumLength() {
+    return 0;
+  }
+
   boolean match(CharIndexed input, REMatch mymatch)
   {
     REMatch trymatch = (REMatch)mymatch.clone();
Index: classpath/gnu/regexp/RETokenRepeated.java
===================================================================
--- classpath/gnu/regexp/RETokenRepeated.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenRepeated.java	(working copy)
@@ -45,12 +45,14 @@
     private int min,max;
     private boolean stingy;
     private boolean possessive;
+    private boolean alwaysEmpty; // Special case of {0}
     
     RETokenRepeated(int subIndex, REToken token, int min, int max) {
 	super(subIndex);
 	this.token = token;
 	this.min = min;
 	this.max = max;
+	alwaysEmpty = (min == 0 && max == 0);
     }
 
     /** Sets the minimal matching mode to true. */
@@ -82,6 +84,36 @@
 	return (min * token.getMinimumLength());
     }
 
+    int getMaximumLength() {
+        if (max == Integer.MAX_VALUE) return Integer.MAX_VALUE;
+	int tmax = token.getMaximumLength();
+	if (tmax == Integer.MAX_VALUE) return tmax;
+	return (max * tmax);
+    }
+
+    boolean stopMatchingIfSatisfied = true;
+
+    private static REMatch findDoables(REToken tk,
+			CharIndexed input, REMatch mymatch) {
+
+	    REMatch.REMatchList doables = new REMatch.REMatchList();
+
+	    // try next repeat at all possible positions
+	    for (REMatch current = mymatch;
+		 current != null; current = current.next) {
+		REMatch recurrent = (REMatch) current.clone();
+		int origin = recurrent.index;
+		tk = (REToken) tk.clone();
+		tk.next = tk.uncle = null;
+		if (tk.match(input, recurrent)) {
+		    if (recurrent.index == origin) recurrent.empty = true;
+		    // add all items in current to doables array
+		    doables.addTail(recurrent);
+		}
+	    }
+	    return doables.head;
+    }
+
     // We do need to save every possible point, but the number of clone()
     // invocations here is really a killer for performance on non-stingy
     // repeat operators.  I'm open to suggestions...
@@ -91,58 +123,34 @@
     // the subexpression back-reference operator allow that?
 
     boolean match(CharIndexed input, REMatch mymatch) {
-	// number of times we've matched so far
-	int numRepeats = 0; 
-	
 	// Possible positions for the next repeat to match at
 	REMatch newMatch = mymatch;
-	REMatch last = null;
-	REMatch current;
 
-	// Add the '0-repeats' index
-	// positions.elementAt(z) == position [] in input after <<z>> matches
-	Vector positions = new Vector();
-	positions.addElement(newMatch);
+	// {0} needs some special treatment.
+	if (alwaysEmpty) {
+	    REMatch result = matchRest(input, newMatch);
+	    if (result != null) {
+	        mymatch.assignFrom(result);
+	        return true;
+	    }
+	    else {
+	        return false;
+	    }
+	}
+
+	// number of times we've matched so far
+	int numRepeats = 0; 
 	
-	// Declare variables used in loop
 	REMatch doables;
-	REMatch doablesLast;
-	REMatch recurrent;
+	int lastIndex = mymatch.index;
+	boolean emptyMatchFound = false;
 
-	do {
-	    // Check for stingy match for each possibility.
-	    if (stingy && (numRepeats >= min)) {
-		REMatch result = matchRest(input, newMatch);
-		if (result != null) {
-		    mymatch.assignFrom(result);
-		    return true;
-		}
-	    }
+	while (numRepeats < min) {
+	    doables = findDoables(token, input, newMatch);
 
-	    doables = null;
-	    doablesLast = null;
-
-	    // try next repeat at all possible positions
-	    for (current = newMatch; current != null; current = current.next) {
-		recurrent = (REMatch) current.clone();
-		if (token.match(input, recurrent)) {
-		    // add all items in current to doables array
-		    if (doables == null) {
-			doables = recurrent;
-			doablesLast = recurrent;
-		    } else {
-			// Order these from longest to shortest
-			// Start by assuming longest (more repeats)
-			doablesLast.next = recurrent;
-		    }
-		    // Find new doablesLast
-		    while (doablesLast.next != null) {
-			doablesLast = doablesLast.next;
-		    }
-		}
-	    }
-	    // if none of the possibilities worked out, break out of do/while
-	    if (doables == null) break;
+	    // if none of the possibilities worked out, 
+	    // it means that minimum number of repeats could not be found.
+	    if (doables == null) return false;
 	    
 	    // reassign where the next repeat can match
 	    newMatch = doables;
@@ -150,44 +158,92 @@
 	    // increment how many repeats we've successfully found
 	    ++numRepeats;
 	    
-	    positions.addElement(newMatch);
-	} while (numRepeats < max);
-	
-	// If there aren't enough repeats, then fail
-	if (numRepeats < min) return false;
-	
-	// We're greedy, but ease off until a true match is found 
-	int posIndex = positions.size();
-	
-	// At this point we've either got too many or just the right amount.
-	// See if this numRepeats works with the rest of the regexp.
-	REMatch allResults = null;
-	REMatch allResultsLast = null;
+	    if (newMatch.empty) {
+		numRepeats = min;
+		emptyMatchFound = true;
+		break;
+	    }
+	    lastIndex = newMatch.index;
+	}
 
-	REMatch results = null;
-	while (--posIndex >= min) {
-	    newMatch = (REMatch) positions.elementAt(posIndex);
-	    results = matchRest(input, newMatch);
-	    if (results != null) {
-		if (allResults == null) {
-		    allResults = results;
-		    allResultsLast = results;
-		} else {
-		    // Order these from longest to shortest
-		    // Start by assuming longest (more repeats)
-		    allResultsLast.next = results;
+	Vector positions = new Vector();
+
+	while (numRepeats <= max) {
+	    // We want to check something like  
+	    //    if (stingy)
+	    // and neglect the further matching.  But experience tells
+	    // such neglection may cause incomplete matching.
+	    // For example, if we neglect the seemingly unnecessay
+	    // matching, /^(b+?|a){1,2}?c/ cannot match "bbc".
+	    // On the other hand, if we do not stop the unnecessary
+	    // matching, /(([a-c])b*?\2)*/ matches "ababbbcbc"
+	    // entirely when we wan to find only "ababb".
+	    // In order to make regression tests pass, we do as we did.
+	    if (stopMatchingIfSatisfied && stingy) {
+		REMatch results = matchRest(input, newMatch);
+		if (results != null) {
+		    mymatch.assignFrom(results);
+		    return true;
 		}
-		// Find new doablesLast
-		while (allResultsLast.next != null) {
-		    allResultsLast = allResultsLast.next;
+	    }
+	    positions.add(newMatch);
+	    if (emptyMatchFound) break;
+
+	    doables = findDoables(token, input, newMatch);
+	    if (doables == null) break;
+
+	    // doables.index == lastIndex occurs either
+	    //   (1) when an empty string was the longest
+	    //       that matched this token.
+	    // or
+	    //   (2) when the same string matches this token many times.
+	    //       For example, "acbab" itself matches "a.*b" and
+	    //       its substrings "acb" and "ab" also match.
+	    //       In this case, we do not have to go further until
+	    //       numRepeats == max because the more numRepeats grows,
+	    //       the shorter the substring matching this token becomes.
+	    //       So the previous succesful match must have bee the best
+	    //       match.  But this is not necessarily the case if stingy.
+	    if (doables.index == lastIndex) {
+	        if (doables.empty) {
+		    emptyMatchFound = true;
+                }
+	        else {
+		    if (!stingy) break;
 		}
 	    }
-	    // else did not match rest of the tokens, try again on smaller sample
-	    // or break out when performing possessive matching
-	    if (possessive) break;
+	    numRepeats++;
+	    newMatch = doables;
+	    lastIndex = newMatch.index;
 	}
-	if (allResults != null) {
-	    mymatch.assignFrom(allResults); // does this get all?
+
+	// We're greedy, but ease off until a true match is found.
+	// At this point we've either got too many or just the right amount.
+	// See if this numRepeats works with the rest of the regexp.
+
+	REMatch.REMatchList allResults = new REMatch.REMatchList();
+
+	int posCount = positions.size();
+	int posIndex = (stingy ? 0 : posCount - 1);
+
+	while (posCount-- > 0) {
+	    REMatch m = (REMatch) positions.elementAt(posIndex);
+            if (stingy) posIndex++; else posIndex--;
+
+	    REMatch results = matchRest(input, m);
+            if (results != null) {
+	    	// Order these from longest to shortest
+		// Start by assuming longest (more repeats)
+		// If stingy the order is shortest to longest.
+		allResults.addTail(results);
+	    }
+	    else {
+		if (possessive) break;
+	    }
+	}
+
+	if (allResults.head != null) {
+	    mymatch.assignFrom(allResults.head); // does this get all?
 	    return true;
 	}
 	// If we fall out, no matches.
@@ -196,27 +252,17 @@
 
     private REMatch matchRest(CharIndexed input, final REMatch newMatch) {
 	REMatch current, single;
-	REMatch doneIndex = null;
-	REMatch doneIndexLast = null;
+	REMatch.REMatchList doneIndex = new REMatch.REMatchList();
 	// Test all possible matches for this number of repeats
 	for (current = newMatch; current != null; current = current.next) {
 	    // clone() separates a single match from the chain
 	    single = (REMatch) current.clone();
 	    if (next(input, single)) {
 		// chain results to doneIndex
-		if (doneIndex == null) {
-		    doneIndex = single;
-		    doneIndexLast = single;
-		} else {
-		    doneIndexLast.next = single;
-		}
-		// Find new doneIndexLast
-		while (doneIndexLast.next != null) {
-		    doneIndexLast = doneIndexLast.next;
-		}
+		doneIndex.addTail(single);
 	    }
 	}
-	return doneIndex;
+	return doneIndex.head;
     }
 
     void dump(StringBuffer os) {
Index: classpath/gnu/regexp/RETokenNamedProperty.java
===================================================================
--- classpath/gnu/regexp/RETokenNamedProperty.java	(revision 0)
+++ classpath/gnu/regexp/RETokenNamedProperty.java	(revision 0)
@@ -0,0 +1,301 @@
+/* gnu/regexp/RETokenNamedProperty.java
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is part of GNU Classpath.
+
+GNU Classpath is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GNU Classpath is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Classpath; see the file COPYING.  If not, write to the
+Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA.
+
+Linking this library statically or dynamically with other modules is
+making a combined work based on this library.  Thus, the terms and
+conditions of the GNU General Public License cover the whole
+combination.
+
+As a special exception, the copyright holders of this library give you
+permission to link this library with independent modules to produce an
+executable, regardless of the license terms of these independent
+modules, and to copy and distribute the resulting executable under
+terms of your choice, provided that you also meet, for each linked
+independent module, the terms and conditions of the license of that
+module.  An independent module is a module which is not derived from
+or based on this library.  If you modify this library, you may extend
+this exception to your version of the library, but you are not
+obligated to do so.  If you do not wish to do so, delete this
+exception statement from your version. */
+
+
+package gnu.regexp;
+
+final class RETokenNamedProperty extends REToken {
+  String name;
+  boolean insens;
+  boolean negate;
+  Handler handler;
+
+  // Grouped properties
+  static final byte[] LETTER = new byte[]
+  { Character.LOWERCASE_LETTER,
+    Character.UPPERCASE_LETTER,
+    Character.TITLECASE_LETTER,
+    Character.MODIFIER_LETTER,
+    Character.OTHER_LETTER };
+  
+  static final byte[] MARK = new byte[]
+  { Character.NON_SPACING_MARK,
+    Character.COMBINING_SPACING_MARK,
+    Character.ENCLOSING_MARK };
+  
+  static final byte[] SEPARATOR = new byte[]
+  { Character.SPACE_SEPARATOR,
+    Character.LINE_SEPARATOR,
+    Character.PARAGRAPH_SEPARATOR };
+  
+  static final byte[] SYMBOL = new byte[]
+  { Character.MATH_SYMBOL,
+    Character.CURRENCY_SYMBOL,
+    Character.MODIFIER_SYMBOL,
+    Character.OTHER_SYMBOL };
+  
+  static final byte[] NUMBER = new byte[]
+  { Character.DECIMAL_DIGIT_NUMBER,
+    Character.LETTER_NUMBER,
+    Character.OTHER_NUMBER };
+  
+  static final byte[] PUNCTUATION = new byte[]
+  { Character.DASH_PUNCTUATION,
+    Character.START_PUNCTUATION,
+    Character.END_PUNCTUATION,
+    Character.CONNECTOR_PUNCTUATION,
+    Character.OTHER_PUNCTUATION,
+    Character.INITIAL_QUOTE_PUNCTUATION,
+    Character.FINAL_QUOTE_PUNCTUATION};
+  
+  static final byte[] OTHER = new byte[]
+  { Character.CONTROL,
+    Character.FORMAT,
+    Character.PRIVATE_USE,
+    Character.SURROGATE,
+    Character.UNASSIGNED };
+
+  RETokenNamedProperty(int subIndex, String name, boolean insens, boolean negate) throws REException {
+    super(subIndex);
+    this.name = name;
+    this.insens = insens;
+    this.negate = negate;
+    handler = getHandler(name); 
+  }
+
+    int getMinimumLength() {
+	return 1;
+    }
+
+    int getMaximumLength() {
+	return 1;
+    }
+
+    boolean match(CharIndexed input, REMatch mymatch) {
+    char ch = input.charAt(mymatch.index);
+    if (ch == CharIndexed.OUT_OF_BOUNDS)
+      return false;
+    
+    boolean retval = handler.includes(ch);
+    if (insens) {
+        retval = retval ||
+                 handler.includes(Character.toUpperCase(ch)) ||
+                 handler.includes(Character.toLowerCase(ch));
+    }
+
+    if (negate) retval = !retval;
+    if (retval) {
+	++mymatch.index;
+	return next(input, mymatch);
+    }
+    else return false;
+  }
+
+  void dump(StringBuffer os) {
+    os.append("\\")
+      .append(negate ? "P" : "p")
+      .append("{" + name + "}");
+  }
+
+  private abstract static class Handler {
+      public abstract boolean includes(char c);
+  }
+
+  private Handler getHandler(String name) throws REException {
+      if (name.equals("Lower") ||
+          name.equals("Upper") ||
+          // name.equals("ASCII") ||
+          name.equals("Alpha") ||
+          name.equals("Digit") ||
+          name.equals("Alnum") ||
+          name.equals("Punct") ||
+          name.equals("Graph") ||
+          name.equals("Print") ||
+          name.equals("Blank") ||
+          name.equals("Cntrl") ||
+          name.equals("XDigit") ||
+          name.equals("Space") ) {
+         return new POSIXHandler(name);
+      }
+      if (name.startsWith("In")) {
+	  try {
+	      name = name.substring(2);
+	      Character.UnicodeBlock block = Character.UnicodeBlock.forName(name);
+	      return new UnicodeBlockHandler(block);
+	  }
+	  catch (IllegalArgumentException e) {
+              throw new REException("Invalid Unicode block name: " + name, REException.REG_ESCAPE, 0);
+	  }
+      }
+      if (name.startsWith("Is")) {
+          name = name.substring(2);
+      }
+
+      // "grouped properties"
+      if (name.equals("L"))
+	  return new UnicodeCategoriesHandler(LETTER);
+      if (name.equals("M"))
+	  return new UnicodeCategoriesHandler(MARK);
+      if (name.equals("Z"))
+	  return new UnicodeCategoriesHandler(SEPARATOR);
+      if (name.equals("S"))
+	  return new UnicodeCategoriesHandler(SYMBOL);
+      if (name.equals("N"))
+	  return new UnicodeCategoriesHandler(NUMBER);
+      if (name.equals("P"))
+	  return new UnicodeCategoriesHandler(PUNCTUATION);
+      if (name.equals("C"))
+	  return new UnicodeCategoriesHandler(OTHER);
+
+      if (name.equals("Mc"))
+          return new UnicodeCategoryHandler(Character.COMBINING_SPACING_MARK);
+      if (name.equals("Pc"))
+          return new UnicodeCategoryHandler(Character.CONNECTOR_PUNCTUATION);
+      if (name.equals("Cc"))
+          return new UnicodeCategoryHandler(Character.CONTROL);
+      if (name.equals("Sc"))
+          return new UnicodeCategoryHandler(Character.CURRENCY_SYMBOL);
+      if (name.equals("Pd"))
+          return new UnicodeCategoryHandler(Character.DASH_PUNCTUATION);
+      if (name.equals("Nd"))
+          return new UnicodeCategoryHandler(Character.DECIMAL_DIGIT_NUMBER);
+      if (name.equals("Me"))
+          return new UnicodeCategoryHandler(Character.ENCLOSING_MARK);
+      if (name.equals("Pe"))
+          return new UnicodeCategoryHandler(Character.END_PUNCTUATION);
+      if (name.equals("Pf"))
+          return new UnicodeCategoryHandler(Character.FINAL_QUOTE_PUNCTUATION);
+      if (name.equals("Cf"))
+          return new UnicodeCategoryHandler(Character.FORMAT);
+      if (name.equals("Pi"))
+          return new UnicodeCategoryHandler(Character.INITIAL_QUOTE_PUNCTUATION);
+      if (name.equals("Nl"))
+          return new UnicodeCategoryHandler(Character.LETTER_NUMBER);
+      if (name.equals("Zl"))
+          return new UnicodeCategoryHandler(Character.LINE_SEPARATOR);
+      if (name.equals("Ll"))
+          return new UnicodeCategoryHandler(Character.LOWERCASE_LETTER);
+      if (name.equals("Sm"))
+          return new UnicodeCategoryHandler(Character.MATH_SYMBOL);
+      if (name.equals("Lm"))
+          return new UnicodeCategoryHandler(Character.MODIFIER_LETTER);
+      if (name.equals("Sk"))
+          return new UnicodeCategoryHandler(Character.MODIFIER_SYMBOL);
+      if (name.equals("Mn"))
+          return new UnicodeCategoryHandler(Character.NON_SPACING_MARK);
+      if (name.equals("Lo"))
+          return new UnicodeCategoryHandler(Character.OTHER_LETTER);
+      if (name.equals("No"))
+          return new UnicodeCategoryHandler(Character.OTHER_NUMBER);
+      if (name.equals("Po"))
+          return new UnicodeCategoryHandler(Character.OTHER_PUNCTUATION);
+      if (name.equals("So"))
+          return new UnicodeCategoryHandler(Character.OTHER_SYMBOL);
+      if (name.equals("Zp"))
+          return new UnicodeCategoryHandler(Character.PARAGRAPH_SEPARATOR);
+      if (name.equals("Co"))
+          return new UnicodeCategoryHandler(Character.PRIVATE_USE);
+      if (name.equals("Zs"))
+          return new UnicodeCategoryHandler(Character.SPACE_SEPARATOR);
+      if (name.equals("Ps"))
+          return new UnicodeCategoryHandler(Character.START_PUNCTUATION);
+      if (name.equals("Cs"))
+          return new UnicodeCategoryHandler(Character.SURROGATE);
+      if (name.equals("Lt"))
+          return new UnicodeCategoryHandler(Character.TITLECASE_LETTER);
+      if (name.equals("Cn"))
+          return new UnicodeCategoryHandler(Character.UNASSIGNED);
+      if (name.equals("Lu"))
+          return new UnicodeCategoryHandler(Character.UPPERCASE_LETTER);
+      throw new REException("unsupported name " + name, REException.REG_ESCAPE, 0);
+  }
+
+  private static class POSIXHandler extends Handler {
+      private RETokenPOSIX retoken;
+      private REMatch mymatch = new REMatch(0,0,0);
+      private char[] chars = new char[1];
+      private CharIndexedCharArray ca = new CharIndexedCharArray(chars, 0);
+      public POSIXHandler(String name) {
+            int posixId = RETokenPOSIX.intValue(name.toLowerCase());
+            if (posixId != -1)
+              retoken = new RETokenPOSIX(0,posixId,false,false);
+	    else
+              throw new RuntimeException("Unknown posix ID: " + name);
+      }
+      public boolean includes(char c) {
+          chars[0] = c;
+          mymatch.index = 0;
+          return retoken.match(ca, mymatch);
+      }
+  }
+
+  private static class UnicodeCategoryHandler extends Handler {
+      public UnicodeCategoryHandler(byte category) {
+          this.category = (int)category;
+      }
+      private int category;
+      public boolean includes(char c) {
+          return Character.getType(c) == category;
+      }
+  }
+
+  private static class UnicodeCategoriesHandler extends Handler {
+      public UnicodeCategoriesHandler(byte[] categories) {
+          this.categories = categories;
+      }
+      private byte[] categories;
+      public boolean includes(char c) {
+	  int category = Character.getType(c);
+          for (int i = 0; i < categories.length; i++)
+              if (category == categories[i])
+	          return true;
+	  return false;
+      }
+  }
+
+  private static class UnicodeBlockHandler extends Handler {
+      public UnicodeBlockHandler(Character.UnicodeBlock block) {
+	  this.block = block;
+      }
+      private Character.UnicodeBlock block;
+      public boolean includes(char c) {
+	  Character.UnicodeBlock cblock = Character.UnicodeBlock.of(c);
+	  return (cblock != null && cblock.equals(block));
+      }
+  }
+
+}
Index: classpath/gnu/regexp/REMatch.java
===================================================================
--- classpath/gnu/regexp/REMatch.java	(revision 110832)
+++ classpath/gnu/regexp/REMatch.java	(working copy)
@@ -67,6 +67,8 @@
     int[] start; // start positions (relative to offset) for each (sub)exp.
     int[] end;   // end positions for the same
     REMatch next; // other possibility (to avoid having to use arrays)
+    boolean empty; // empty string matched. This flag is used only within
+		   // RETokenRepeated.
 
     public Object clone() {
 	try {
@@ -177,7 +179,9 @@
      * @param sub Index of the subexpression.
      */
     public String toString(int sub) {
-	if ((sub >= start.length) || (start[sub] == -1)) return "";
+	if ((sub >= start.length) || sub < 0)
+	    throw new IndexOutOfBoundsException("No group " + sub);
+	if (start[sub] == -1) return null;
 	return (matchedText.substring(start[sub],end[sub]));
     }
     
@@ -242,6 +246,8 @@
      * <code>$0</code> through <code>$9</code>.  <code>$0</code> matches
      * the full substring matched; <code>$<i>n</i></code> matches
      * subexpression number <i>n</i>.
+     * <code>$10, $11, ...</code> may match the 10th, 11th, ... subexpressions
+     * if such subexpressions exist.
      *
      * @param input A string consisting of literals and <code>$<i>n</i></code> tokens.
      */
@@ -252,6 +258,16 @@
 	for (pos = 0; pos < input.length()-1; pos++) {
 	    if ((input.charAt(pos) == '$') && (Character.isDigit(input.charAt(pos+1)))) {
 		int val = Character.digit(input.charAt(++pos),10);
+		int pos1 = pos + 1;
+		while (pos1 < input.length() &&
+		       Character.isDigit(input.charAt(pos1))) {
+		    int val1 = val*10 + Character.digit(input.charAt(pos1),10);
+		    if (val1 >= start.length) break;
+		    pos1++;
+		    val = val1;
+		}
+		pos = pos1 - 1;
+
 		if (val < start.length) {
 		    output.append(toString(val));
 		} 
@@ -260,4 +276,42 @@
 	if (pos < input.length()) output.append(input.charAt(pos));
 	return output.toString();
     }
+
+    static class REMatchList {
+        REMatch head;
+	REMatch tail;
+        REMatchList() {
+	    head = tail = null;
+	}
+	/* Not used now. But we may need this some day?
+	void addHead(REMatch newone) {
+            if (head == null) {
+                head = newone;
+                tail = newone;
+                while (tail.next != null) {
+                    tail = tail.next;
+                }
+            }
+	    else {
+                REMatch tmp = newone;
+                while (tmp.next != null) tmp = tmp.next;
+                tmp.next = head;
+	        head = newone;
+	    }
+	}
+	*/
+	void addTail(REMatch newone) {
+            if (head == null) {
+                head = newone;
+                tail = newone;
+            }
+            else {
+                tail.next = newone;
+            }
+            while (tail.next != null) {
+                tail = tail.next;
+            }
+	}
+    }
+
 }
Index: classpath/gnu/regexp/RETokenRange.java
===================================================================
--- classpath/gnu/regexp/RETokenRange.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenRange.java	(working copy)
@@ -43,19 +43,32 @@
 
   RETokenRange(int subIndex, char lo, char hi, boolean ins) {
     super(subIndex);
-    this.lo = (insens = ins) ? Character.toLowerCase(lo) : lo;
-    this.hi = ins ? Character.toLowerCase(hi) : hi;
+    insens = ins;
+    this.lo = lo;
+    this.hi = hi;
   }
 
   int getMinimumLength() {
     return 1;
   }
 
+  int getMaximumLength() {
+    return 1;
+  }
+
     boolean match(CharIndexed input, REMatch mymatch) {
 	char c = input.charAt(mymatch.index);
 	if (c == CharIndexed.OUT_OF_BOUNDS) return false;
-	if (insens) c = Character.toLowerCase(c);
-	if ((c >= lo) && (c <= hi)) {
+	boolean matches = (c >= lo) && (c <= hi);
+	if (! matches && insens) {
+	  char c1 = Character.toLowerCase(c);
+	  matches = (c1 >= lo) && (c1 <= hi);
+	  if (!matches) {
+	    c1 = Character.toUpperCase(c);
+	    matches = (c1 >= lo) && (c1 <= hi);
+	  }
+	}
+	if (matches) {
 	    ++mymatch.index;
 	    return next(input, mymatch);
 	}
Index: classpath/gnu/regexp/RETokenBackRef.java
===================================================================
--- classpath/gnu/regexp/RETokenBackRef.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenBackRef.java	(working copy)
@@ -51,13 +51,25 @@
   // should implement getMinimumLength() -- any ideas?
 
     boolean match(CharIndexed input, REMatch mymatch) {
+	if (num >= mymatch.start.length) return false;
+	if (num >= mymatch.end.length) return false;
 	int b,e;
 	b = mymatch.start[num];
 	e = mymatch.end[num];
 	if ((b==-1)||(e==-1)) return false; // this shouldn't happen, but...
 	for (int i=b; i<e; i++) {
-	    if (input.charAt(mymatch.index+i-b) != input.charAt(i)) {
-		return false;
+	    char c1 = input.charAt(mymatch.index+i-b);
+	    char c2 = input.charAt(i);
+	    if (c1 != c2) {
+		if (insens) {
+		    if (c1 != Character.toLowerCase(c2) &&
+			c1 != Character.toUpperCase(c2)) {
+			return false;
+		    }
+		}
+		else {
+		    return false;
+		}
 	    }
 	}
 	mymatch.index += e-b;
Index: classpath/gnu/regexp/RETokenStart.java
===================================================================
--- classpath/gnu/regexp/RETokenStart.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenStart.java	(working copy)
@@ -44,6 +44,10 @@
 	super(subIndex);
 	this.newline = newline;
     }
+
+    int getMaximumLength() {
+        return 0;
+    }
     
     boolean match(CharIndexed input, REMatch mymatch) {
 	// charAt(index-n) may be unknown on a Reader/InputStream. FIXME
Index: classpath/gnu/regexp/RETokenIndependent.java
===================================================================
--- classpath/gnu/regexp/RETokenIndependent.java	(revision 0)
+++ classpath/gnu/regexp/RETokenIndependent.java	(revision 0)
@@ -0,0 +1,76 @@
+/* gnu/regexp/RETokenIndependent.java
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is part of GNU Classpath.
+
+GNU Classpath is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GNU Classpath is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Classpath; see the file COPYING.  If not, write to the
+Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA.
+
+Linking this library statically or dynamically with other modules is
+making a combined work based on this library.  Thus, the terms and
+conditions of the GNU General Public License cover the whole
+combination.
+
+As a special exception, the copyright holders of this library give you
+permission to link this library with independent modules to produce an
+executable, regardless of the license terms of these independent
+modules, and to copy and distribute the resulting executable under
+terms of your choice, provided that you also meet, for each linked
+independent module, the terms and conditions of the license of that
+module.  An independent module is a module which is not derived from
+or based on this library.  If you modify this library, you may extend
+this exception to your version of the library, but you are not
+obligated to do so.  If you do not wish to do so, delete this
+exception statement from your version. */
+
+package gnu.regexp;
+
+/**
+ * @author Ito Kazumitsu
+ */
+final class RETokenIndependent extends REToken
+{
+  REToken re;
+
+  RETokenIndependent(REToken re) throws REException {
+    super(0);
+    this.re = re;
+  }
+
+  int getMinimumLength() {
+    return re.getMinimumLength();
+  }
+
+  int getMaximumLength() {
+    return re.getMaximumLength();
+  }
+
+  boolean match(CharIndexed input, REMatch mymatch)
+  {
+    if (re.match(input, mymatch)) {
+      // Once we have found a match, we do not see other possible matches.
+      mymatch.next = null;
+      return next(input, mymatch);
+    }
+    return false;
+  }
+
+    void dump(StringBuffer os) {
+	os.append("(?>");
+	re.dumpAll(os);
+	os.append(')');
+    }
+}
+
Index: classpath/gnu/regexp/RETokenPOSIX.java
===================================================================
--- classpath/gnu/regexp/RETokenPOSIX.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenPOSIX.java	(working copy)
@@ -81,6 +81,10 @@
 	return 1;
     }
 
+    int getMaximumLength() {
+	return 1;
+    }
+
     boolean match(CharIndexed input, REMatch mymatch) {
     char ch = input.charAt(mymatch.index);
     if (ch == CharIndexed.OUT_OF_BOUNDS)
Index: classpath/gnu/regexp/RETokenEnd.java
===================================================================
--- classpath/gnu/regexp/RETokenEnd.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenEnd.java	(working copy)
@@ -49,6 +49,10 @@
     this.newline = newline;
   }
 
+  int getMaximumLength() {
+    return 0;
+  }
+
     boolean match(CharIndexed input, REMatch mymatch) {
 	char ch = input.charAt(mymatch.index);
 	if (ch == CharIndexed.OUT_OF_BOUNDS)
Index: classpath/gnu/regexp/RETokenOneOf.java
===================================================================
--- classpath/gnu/regexp/RETokenOneOf.java	(revision 110832)
+++ classpath/gnu/regexp/RETokenOneOf.java	(working copy)
@@ -70,53 +70,67 @@
     return min;
   }
 
+
+  int getMaximumLength() {
+    int max = 0;
+    int x;
+    for (int i=0; i < options.size(); i++) {
+      if ((x = ((REToken) options.elementAt(i)).getMaximumLength()) > max)
+	max = x;
+    }
+    return max;
+  }
+
     boolean match(CharIndexed input, REMatch mymatch) {
-    if (negative && (input.charAt(mymatch.index) == CharIndexed.OUT_OF_BOUNDS)) 
+      return negative ? matchN(input, mymatch) : matchP(input, mymatch);
+    }
+
+    private boolean matchN(CharIndexed input, REMatch mymatch) {
+    if (input.charAt(mymatch.index) == CharIndexed.OUT_OF_BOUNDS) 
       return false;
 
     REMatch newMatch = null;
     REMatch last = null;
     REToken tk;
-    boolean isMatch;
     for (int i=0; i < options.size(); i++) {
 	tk = (REToken) options.elementAt(i);
 	REMatch tryMatch = (REMatch) mymatch.clone();
 	if (tk.match(input, tryMatch)) { // match was successful
-	    if (negative) return false;
+	    return false;
+	} // is a match
+    } // try next option
 
-	    if (next(input, tryMatch)) {
-		// Add tryMatch to list of possibilities.
-		if (last == null) {
-		    newMatch = tryMatch;
-		    last = tryMatch;
-		} else {
-		    last.next = tryMatch;
-		    last = tryMatch;
-		}
-	    } // next succeeds
+    ++mymatch.index;
+    return next(input, mymatch);
+  }
+
+    private boolean matchP(CharIndexed input, REMatch mymatch) {
+    REMatch.REMatchList newMatch = new REMatch.REMatchList();
+    REToken tk;
+    for (int i=0; i < options.size(); i++) {
+	// In order that the backtracking can work,
+	// each option must be chained to the next token.
+	// But the chain method has some side effect, so
+	// we use clones.
+	tk = (REToken)((REToken) options.elementAt(i)).clone();
+	tk.chain(this.next);
+	tk.setUncle(this.uncle);
+	tk.subIndex = this.subIndex;
+	REMatch tryMatch = (REMatch) mymatch.clone();
+	if (tk.match(input, tryMatch)) { // match was successful
+	    newMatch.addTail(tryMatch);
 	} // is a match
     } // try next option
 
-    if (newMatch != null) {
-	if (negative) {
-	    return false;
-	} else {
-	    // set contents of mymatch equal to newMatch
+    if (newMatch.head != null) {
+	// set contents of mymatch equal to newMatch
 
-	    // try each one that matched
-	    mymatch.assignFrom(newMatch);
-	    return true;
-	}
+	// try each one that matched
+	mymatch.assignFrom(newMatch.head);
+	return true;
     } else {
-	if (negative) {
-	    ++mymatch.index;
-	    return next(input, mymatch);
-	} else {
-	    return false;
-	}
+	return false;
     }
-
-    // index+1 works for [^abc] lists, not for generic lookahead (--> index)
   }
 
   void dump(StringBuffer os) {
Index: classpath/java/net/URI.java
===================================================================
--- classpath/java/net/URI.java	(revision 110832)
+++ classpath/java/net/URI.java	(working copy)
@@ -1,5 +1,5 @@
 /* URI.java -- An URI class
-   Copyright (C) 2002, 2004, 2005  Free Software Foundation, Inc.
+   Copyright (C) 2002, 2004, 2005, 2006  Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -346,8 +346,15 @@
   private static String getURIGroup(Matcher match, int group)
   {
     String matched = match.group(group);
-    return matched.length() == 0 
-      ? ((match.group(group - 1).length() == 0) ? null : "") : matched;
+    if (matched == null || matched.length() == 0)
+      {
+	String prevMatched = match.group(group -1);
+	if (prevMatched == null || prevMatched.length() == 0)
+	  return null;
+	else
+	  return "";
+      }
+    return matched;
   }
 
   /**
Index: classpath/java/util/regex/Matcher.java
===================================================================
--- classpath/java/util/regex/Matcher.java	(revision 110832)
+++ classpath/java/util/regex/Matcher.java	(working copy)
@@ -1,5 +1,5 @@
 /* Matcher.java -- Instance of a regular expression applied to a char sequence.
-   Copyright (C) 2002, 2004 Free Software Foundation, Inc.
+   Copyright (C) 2002, 2004, 2006 Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -38,6 +38,7 @@
 
 package java.util.regex;
 
+import gnu.regexp.RE;
 import gnu.regexp.REMatch;
 
 /**
@@ -45,7 +46,7 @@
  *
  * @since 1.4
  */
-public final class Matcher
+public final class Matcher implements MatchResult
 {
   private Pattern pattern;
   private CharSequence input;
@@ -233,10 +234,15 @@
    */
   public boolean matches ()
   {
-    if (lookingAt())
+    match = pattern.getRE().getMatch(input, 0, RE.REG_TRY_ENTIRE_MATCH);
+    if (match != null)
       {
-	if (position == input.length())
-	  return true;
+	if (match.getStartIndex() == 0)
+	  {
+	    position = match.getEndIndex();
+	    if (position == input.length())
+	        return true;
+	  }
 	match = null;
       }
     return false;
Index: classpath/java/util/regex/PatternSyntaxException.java
===================================================================
--- classpath/java/util/regex/PatternSyntaxException.java	(revision 110832)
+++ classpath/java/util/regex/PatternSyntaxException.java	(working copy)
@@ -41,6 +41,7 @@
  * Indicates illegal pattern for regular expression.
  * Includes state to inspect the pattern and what and where the expression
  * was not valid regular expression.
+ * @since 1.4
  */
 public class PatternSyntaxException extends IllegalArgumentException
 {
Index: classpath/java/util/regex/MatchResult.java
===================================================================
--- classpath/java/util/regex/MatchResult.java	(revision 0)
+++ classpath/java/util/regex/MatchResult.java	(revision 0)
@@ -0,0 +1,81 @@
+/* MatchResult.java -- Result of a regular expression match.
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is part of GNU Classpath.
+
+GNU Classpath is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GNU Classpath is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Classpath; see the file COPYING.  If not, write to the
+Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA.
+
+Linking this library statically or dynamically with other modules is
+making a combined work based on this library.  Thus, the terms and
+conditions of the GNU General Public License cover the whole
+combination.
+
+As a special exception, the copyright holders of this library give you
+permission to link this library with independent modules to produce an
+executable, regardless of the license terms of these independent
+modules, and to copy and distribute the resulting executable under
+terms of your choice, provided that you also meet, for each linked
+independent module, the terms and conditions of the license of that
+module.  An independent module is a module which is not derived from
+or based on this library.  If you modify this library, you may extend
+this exception to your version of the library, but you are not
+obligated to do so.  If you do not wish to do so, delete this
+exception statement from your version. */
+
+
+package java.util.regex;
+
+/**
+ * This interface represents the result of a regular expression match.
+ * It can be used to query the contents of the match, but not to modify
+ * them.
+ * @since 1.5
+ */
+public interface MatchResult
+{
+  /** Returns the index just after the last matched character.  */
+  int end();
+  
+  /**
+   * Returns the index just after the last matched character of the
+   * given sub-match group.
+   * @param group the sub-match group
+   */ 
+  int end(int group);
+
+  /** Returns the substring of the input which was matched.  */
+  String group();
+  
+  /** 
+   * Returns the substring of the input which was matched by the
+   * given sub-match group.
+   * @param group the sub-match group
+   */
+  String group(int group);
+
+  /** Returns the number of sub-match groups in the matching pattern.  */  
+  int groupCount();
+
+  /** Returns the index of the first character of the match.  */
+  int start();
+
+  /**
+   * Returns the index of the first character of the given sub-match
+   * group.
+   * @param group the sub-match group
+   */
+  int start(int group);
+}
Index: classpath/java/util/regex/Pattern.java
===================================================================
--- classpath/java/util/regex/Pattern.java	(revision 110832)
+++ classpath/java/util/regex/Pattern.java	(working copy)
@@ -103,8 +103,11 @@
       }
     catch (REException e)
       {
-	throw new PatternSyntaxException(e.getMessage(),
+	PatternSyntaxException pse;
+	pse = new PatternSyntaxException(e.getMessage(),
 					 regex, e.getPosition());
+	pse.initCause(e);
+	throw pse;
       }
   }
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]