Bug 13062 - StreamTokenizer ignores commentChar
Summary: StreamTokenizer ignores commentChar
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcj (show other bugs)
Version: 3.3.2
: P2 normal
Target Milestone: 3.4.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-11-15 18:28 UTC by Martin Jansche
Modified: 2003-11-16 21:14 UTC (History)
2 users (show)

See Also:
Host: i686-pc-linux-gnu (from config.guess)
Target: i486-linux (from gcj -dumpmachine)
Build:
Known to work:
Known to fail:
Last reconfirmed: 2003-11-15 22:18:17


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Jansche 2003-11-15 18:28:34 UTC
The behavior of java.io.StreamTokenizer differs from Sun's implementation;
StreamTokenizer can be told about a comment character, but libgcj's
implementation seems to ignore the comment characters sometimes.

Consider the following program:

// Instances.java
// pared-down version of weka.core.Instances from weka-3-4

import java.io.InputStreamReader;
import java.io.StreamTokenizer;

class Instances
{
  public static void main(String[] args)
    throws Exception
  {
    StreamTokenizer tok = new
      StreamTokenizer(new InputStreamReader(System.in));
    initTokenizer(tok);

    for (int t = tok.nextToken()
	; t != tok.TT_EOF
	; t = tok.nextToken()) {
      System.out.println(tok.toString());
    }
  }

  private static void initTokenizer(StreamTokenizer tokenizer) {
    tokenizer.resetSyntax();         
    tokenizer.whitespaceChars(0, ' ');    
    tokenizer.wordChars(' '+1, '\u00FF');
    tokenizer.whitespaceChars(',', ',');
    tokenizer.commentChar('%');
    tokenizer.eolIsSignificant(true);
  }
}

// eof

When this program is linked against Sun's class libraries (tested with J2SDK
1.4.x) the following happens:

$ java Instances
  %foo,bar baz
Token[EOL], line 2
$ 

But when compiled with gcj and linked against libgcj4, the supplied commentChar
'%' gets ignored:

$ gcj -g -o instances --main=Instances Instances.java
$ ./instances
  %foo,bar baz
Token[%foo], line 1
Token[bar], line 1
Token[baz], line 1
Token[EOL], line 2
$ 

This behavior seems wrong.
Comment 1 Andrew Pinski 2003-11-15 22:18:17 UTC
I can confirm this on the mainline.
Comment 2 Andrew Pinski 2003-11-15 22:28:35 UTC
The problem looks like "Any other attribute settings for the specified character are cleared." is not 
being done when commentChar is called.
Comment 3 Tom Tromey 2003-11-16 21:14:48 UTC
Thanks for the concise report and test case.
I've put the test into Mauve and checked in a fix to libgcj.
Comment 4 GCC Commits 2003-11-16 21:15:59 UTC
Subject: Bug 13062

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	tromey@gcc.gnu.org	2003-11-16 21:15:55

Modified files:
	libjava        : ChangeLog 
	libjava/java/io: StreamTokenizer.java 

Log message:
	PR libgcj/13062:
	* java/io/StreamTokenizer.java (commentChar): Clear other
	attributes for character.
	(quoteChar): Likewise.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/ChangeLog.diff?cvsroot=gcc&r1=1.2338&r2=1.2339
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libjava/java/io/StreamTokenizer.java.diff?cvsroot=gcc&r1=1.13&r2=1.14