This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Patch for SentenceBreakIterator


>>>>> "Mark" == Mark Wielaard <mark@klomp.org> writes:

Mark> While using gjdoc on the Classpath source we discovered that our
Mark> BreakIterator does not determine the end-of-sentence in the same
Mark> way as some other implementations. In particular a '.' followed
Mark> by a '\n' is not recognized as the end of a sentence (since '\n'
Mark> is a control character according to Unicode, not a space
Mark> character). So this patch makes us behave more like the standard
Mark> JDK implementation.

Thanks.  I merged this code with Classpath, so make sure the change
goes there too.

When I wrote the java.text break algorithms, I did it by reading the
Unicode standard (the 2.0 book which is on my shelf) and implementing
what I read (as much as I understood anyway :-).  It is possible, even
likely, that these things have changed in the meantime.  If you're
interested, take a look at the online document and see what it says.

Mark> 2002-05-11  Mark Wielaard  <mark@klomp.org>
Mark>   * gnu/java/text/SentenceBreakIterator.java (next): Skip all java white
Mark>   space characters.
Mark>   (previous_internal): Likewise.

The patch was trashed.  Could you resend it?

Mark> P.S. This seems like a save bug fix for 3.1.1. Do we already
Mark> have a policy for which patches may/should/can be backported to
Mark> the branch?

We don't.  I expect that conversation to start shortly.

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]