This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Patch for SentenceBreakIterator


Hi,

On Sat, 2002-05-11 at 20:09, Tom Tromey wrote:
> >>>>> "Mark" == Mark Wielaard <mark@klomp.org> writes:
> Thanks.  I merged this code with Classpath, so make sure the change
> goes there too.

Sure. We have a lot of merging to do so I will make sure these files
stay in sync at least.

> When I wrote the java.text break algorithms, I did it by reading the
> Unicode standard (the 2.0 book which is on my shelf) and implementing
> what I read (as much as I understood anyway :-).  It is possible, even
> likely, that these things have changed in the meantime.  If you're
> interested, take a look at the online document and see what it says.

Julian said that there were even some differences between Sun JDK
implementations (gjdoc has some code to work around it).

> Mark> 2002-05-11  Mark Wielaard  <mark@klomp.org>
> Mark>   * gnu/java/text/SentenceBreakIterator.java (next): Skip all java white
> Mark>   space characters.
> Mark>   (previous_internal): Likewise.
> 
> The patch was trashed.  Could you resend it?

I see, don't know what happened there.
Here it is again.

Cheers,

Mark  
Index: gnu/java/text/SentenceBreakIterator.java
===================================================================
RCS file: /cvs/gcc/gcc/libjava/gnu/java/text/SentenceBreakIterator.java,v
retrieving revision 1.2
diff -u -r1.2 SentenceBreakIterator.java
--- gnu/java/text/SentenceBreakIterator.java	22 Jan 2002 22:40:02 -0000	1.2
+++ gnu/java/text/SentenceBreakIterator.java	12 May 2002 00:22:36 -0000
@@ -1,5 +1,5 @@
 /* SentenceBreakIterator.java - Default sentence BreakIterator.
-   Copyright (C) 1999, 2001 Free Software Foundation, Inc.
+   Copyright (C) 1999, 2001, 2002 Free Software Foundation, Inc.
 
 This file is part of GNU Classpath.
 
@@ -91,13 +91,8 @@
 	    while (n != CharacterIterator.DONE
 		   && Character.getType(n) == Character.END_PUNCTUATION)
 	      n = iter.next();
-	    // Skip spaces.
-	    while (n != CharacterIterator.DONE
-		   && Character.getType(n) == Character.SPACE_SEPARATOR)
-	      n = iter.next();
-	    // Skip optional paragraph separator.
-	    if (n != CharacterIterator.DONE
-		&& Character.getType(n) == Character.PARAGRAPH_SEPARATOR)
+	    // Skip (java) space, line and paragraph separators.
+	    while (n != CharacterIterator.DONE && Character.isWhitespace(n))
 	      n = iter.next();
 
 	    // There's always a break somewhere after `!' or `?'.
@@ -111,11 +106,11 @@
 	    while (n != CharacterIterator.DONE
 		   && Character.getType(n) == Character.END_PUNCTUATION)
 	      n = iter.next();
-	    // Skip spaces.  We keep count because we need at least
-	    // one for this period to represent a terminator.
+	    // Skip (java) space, line and paragraph separators.
+	    // We keep count because we need at least one for this period to
+	    // represent a terminator.
 	    int spcount = 0;
-	    while (n != CharacterIterator.DONE
-		   && Character.getType(n) == Character.SPACE_SEPARATOR)
+	    while (n != CharacterIterator.DONE && Character.isWhitespace(n))
 	      {
 		n = iter.next();
 		++spcount;
@@ -162,7 +157,7 @@
 
 	if (! Character.isLowerCase(c)
 	    && (nt == Character.START_PUNCTUATION
-		|| nt == Character.SPACE_SEPARATOR))
+		|| Character.isWhitespace(n)))
 	  {
 	    int save = iter.getIndex();
 	    int save_nt = nt;
@@ -173,12 +168,12 @@
 	      n = iter.previous();
 	    if (n == CharacterIterator.DONE)
 	      break;
-	    if (Character.getType(n) == Character.SPACE_SEPARATOR)
+	    if (Character.isWhitespace(n))
 	      {
-		// Must have at least once space after the `.'.
+		// Must have at least one (java) space after the `.'.
 		int save2 = iter.getIndex();
 		while (n != CharacterIterator.DONE
-		       && Character.getType(n) == Character.SPACE_SEPARATOR)
+		       && Character.isWhitespace(n))
 		  n = iter.previous();
 		// Skip close punctuation.
 		while (n != CharacterIterator.DONE
@@ -203,13 +198,13 @@
 	    period = iter.getIndex();
 	    break;
 	  }
-	else if (nt == Character.SPACE_SEPARATOR
+	else if (Character.isWhitespace(n)
 		 || nt == Character.END_PUNCTUATION)
 	  {
 	    int save = iter.getIndex();
-	    // Skip spaces.
+	    // Skip (java) space, line and paragraph separators.
 	    while (n != CharacterIterator.DONE
-		   && Character.getType(n) == Character.SPACE_SEPARATOR)
+		   && Character.isWhitespace(n))
 	      n = iter.previous();
 	    // Skip close punctuation.
 	    while (n != CharacterIterator.DONE

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]