The behavior of java.io.StreamTokenizer differs from Sun's implementation; StreamTokenizer can be told about a comment character, but libgcj's implementation seems to ignore the comment characters sometimes. Consider the following program: // Instances.java // pared-down version of weka.core.Instances from weka-3-4 import java.io.InputStreamReader; import java.io.StreamTokenizer; class Instances { public static void main(String[] args) throws Exception { StreamTokenizer tok = new StreamTokenizer(new InputStreamReader(System.in)); initTokenizer(tok); for (int t = tok.nextToken() ; t != tok.TT_EOF ; t = tok.nextToken()) { System.out.println(tok.toString()); } } private static void initTokenizer(StreamTokenizer tokenizer) { tokenizer.resetSyntax(); tokenizer.whitespaceChars(0, ' '); tokenizer.wordChars(' '+1, '\u00FF'); tokenizer.whitespaceChars(',', ','); tokenizer.commentChar('%'); tokenizer.eolIsSignificant(true); } } // eof When this program is linked against Sun's class libraries (tested with J2SDK 1.4.x) the following happens: $ java Instances %foo,bar baz Token[EOL], line 2 $ But when compiled with gcj and linked against libgcj4, the supplied commentChar '%' gets ignored: $ gcj -g -o instances --main=Instances Instances.java $ ./instances %foo,bar baz Token[%foo], line 1 Token[bar], line 1 Token[baz], line 1 Token[EOL], line 2 $ This behavior seems wrong.
I can confirm this on the mainline.
The problem looks like "Any other attribute settings for the specified character are cleared." is not being done when commentChar is called.
Thanks for the concise report and test case. I've put the test into Mauve and checked in a fix to libgcj.
Subject: Bug 13062 CVSROOT: /cvs/gcc Module name: gcc Changes by: tromey@gcc.gnu.org 2003-11-16 21:15:55 Modified files: libjava : ChangeLog libjava/java/io: StreamTokenizer.java Log message: PR libgcj/13062: * java/io/StreamTokenizer.java (commentChar): Clear other attributes for character. (quoteChar): Likewise. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/ChangeLog.diff?cvsroot=gcc&r1=1.2338&r2=1.2339 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/java/io/StreamTokenizer.java.diff?cvsroot=gcc&r1=1.13&r2=1.14