This is the mail archive of the
java-patches@gcc.gnu.org
mailing list for the Java project.
Re: Patch for SentenceBreakIterator
- From: Tom Tromey <tromey at redhat dot com>
- To: Mark Wielaard <mark at klomp dot org>
- Cc: java-patches at gcc dot gnu dot org
- Date: 11 May 2002 12:09:53 -0600
- Subject: Re: Patch for SentenceBreakIterator
- References: <1021119209.20212.9.camel@elsschot>
- Reply-to: tromey at redhat dot com
>>>>> "Mark" == Mark Wielaard <mark@klomp.org> writes:
Mark> While using gjdoc on the Classpath source we discovered that our
Mark> BreakIterator does not determine the end-of-sentence in the same
Mark> way as some other implementations. In particular a '.' followed
Mark> by a '\n' is not recognized as the end of a sentence (since '\n'
Mark> is a control character according to Unicode, not a space
Mark> character). So this patch makes us behave more like the standard
Mark> JDK implementation.
Thanks. I merged this code with Classpath, so make sure the change
goes there too.
When I wrote the java.text break algorithms, I did it by reading the
Unicode standard (the 2.0 book which is on my shelf) and implementing
what I read (as much as I understood anyway :-). It is possible, even
likely, that these things have changed in the meantime. If you're
interested, take a look at the online document and see what it says.
Mark> 2002-05-11 Mark Wielaard <mark@klomp.org>
Mark> * gnu/java/text/SentenceBreakIterator.java (next): Skip all java white
Mark> space characters.
Mark> (previous_internal): Likewise.
The patch was trashed. Could you resend it?
Mark> P.S. This seems like a save bug fix for 3.1.1. Do we already
Mark> have a policy for which patches may/should/can be backported to
Mark> the branch?
We don't. I expect that conversation to start shortly.
Tom