This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[gcjx] RFC/RFA: Show Problematic Source Lines with Diagnostics

From: Ranjit Mathew <rmathew at gmail dot com>
To: java-patches at gcc dot gnu dot org
Date: Sun, 02 Oct 2005 21:53:04 +0530
Subject: [gcjx] RFC/RFA: Show Problematic Source Lines with Diagnostics
Openpgp: url=http://ranjitmathew.hostingzero.com/aa_6C114B8F.txt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

  As noted earlier, GCJX doesn't show the problematic lines from
Java source files with its diagnostics, unlike most compilers. The
attached patch proposes to fix this situation.

The first order of business was to be able to retrieve a line
from a source file given its line number, since the current
lexer just works with a stream of tokens. I have opted to
just rewind and read the desired line, instead of storing
line number information upfront as I reckon the compiler will
encounter *far more* correct source files in real life every
time than incorrect source code - there's no point in penalising
the common code path for this feature. Besides this, GCJ mainline
also does a similar thing.

That aside, I wanted the lexer to be able to retrieve the
desired line and get back to its previous state. This might
not matter now, since we just lex the whole file at once. This
is the reason for the new positioning methods in ucs2_reader,
though it does hurt the abstraction a bit - it becomes a
random-access across the input instead of a linear stream of
bytes.

Next up was for the exception formatter to be able to
show the desired line with a little caret underneath the
problematic column, for which it needed access to the lexer.
This is the reason for the new set_lexer methods in exception_base
and format_repr.

Since lines would be needed after the whole file has been parsed,
it was no longer proper to ditch the lexer after the parsing
phase. Hence I moved the ownership of the lexer from the parse
class to the compiler class.

The effect of this patch is shown below:
- --------------------------- 8< ---------------------------
~/src/GCJ > cat Hello.java
public class Hello
{
  public static void main( String[] args)
  {
    junk foo;
    System.out.println( "Hello World!");
  }
}
~/src/GCJ > $GCJX Hello.java
./Hello.java:6:4: error: type named 'junk' is undefined

    System.out.println( "Hello World!");
    ^

- --------------------------- 8< ---------------------------

Note that the wrong line and column is shown here. But that's
not my fault - the incoming information was wrong to begin
with! That's material for another patch, another day.

Tested on i686-pc-linux-gnu. No changes in Jacks.

OK? Comments?

Thanks,
Ranjit.

- --
Ranjit Mathew       Email: rmathew AT gmail DOT com

Bangalore, INDIA.     Web: http://ranjitmathew.hostingzero.com/




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDQAloYb1hx2wRS48RAlSPAKCDKG+p8rWFyh6zu1w77K/wnw1Y7QCfUsFX
/uQxaJbzWBO7ambiXs+6l30=
=BA0O
-----END PGP SIGNATURE-----

Index: ChangeLog
from  Ranjit Mathew  <rmathew@gcc.gnu.org>

	Show lines from input files in diagnostics.
	* source/ucs2.hh (ucs2_reader::limit): Rename to 'end'.
	(ucs2_reader::begin): New field.
	(ucs2_reader::posn): Likewise.
	(ucs2_reader::max_posn): Likewise.
	(ucs2_reader::get_uint8): Update POSN as needed.
	(ucs2_reader::get_posn): New method.
	(ucs2_reader::set_posn): Likewise.
	* source/ucs2.cc (ucs2_reader::ucs2_reader): Initialise new fields.
	* source/iconv.cc (iconv_ucs2_reader::refill): Use END instead of
	LIMIT.
	(iconv_ucs2_reader::get): Likewise.
	* source/lex.hh (lexer::get_line): New method.
	* source/lex.cc (lexer::get_line): Implementation of the above.
	* location.hh (location::get_column): New method.
	* format/format.hh (format_repr::src_lexer): New field.
	(format_repr::set_lexer): New method.
	* format/format.cc (format_repr::format_repr): Initialise SRC_LEXER.
	(format_repr::dump): Show source line, if possible, using SRC_LEXER.
	* exception.hh (exception_base::set_lexer): New method.
	* model/unit.hh (model_unit::src_lexer): New field.
	(model_unit::model_unit): Initialise SRC_LEXER.
	(model_unit::~model_unit): New destructor.
	(model_unit::set_lexer): New method.
	(model_unit::get_lexer): Likewise.
	* reader/source.cc (source_file_creator::apply): Set lexer for
	compilation unit and exception, if any.
	* source/parse.hh (parse::~parse): Do not destory token stream.
	* typedefs.hh: Rearrange inclusion of headers.
	* compiler.cc (compiler::do_analyze_unit): Set lexer for exception,
	if any.

Index: source/ucs2.hh
===================================================================
--- source/ucs2.hh	2005-10-02 15:31:24.000000000 +0530
+++ source/ucs2.hh	2005-10-02 21:10:35.000000000 +0530
@@ -1,6 +1,6 @@
 // ucs-2 readers.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -31,16 +31,31 @@ protected:
   /// The buffer holding all input data.
   byte_buffer *input;
 
+  /// Pointer to the beginning of input data.
+  const uint8 *begin;
+
   /// Current pointer.
   const uint8 *curr;
 
-  /// Limit pointer.
-  const uint8 *limit;
+  /// Pointer to the end of input data.
+  const uint8 *end;
+
+  /// Current position.
+  int posn;
+
+  /// Maximum possible position.
+  const int max_posn;
 
   /// Return the next uint8, or -1 on EOF.
   int get_uint8 ()
   {
-    return curr == limit ? -1 : *curr++;
+    if (curr == end)
+      return -1;
+    else
+      {
+        posn++;
+        return *curr++;
+      }
   }
 
   int here ();
@@ -56,6 +71,22 @@ public:
   /// conversion error; the caller is expected to set the location on
   /// this exception.
   virtual unicode_w_t get () = 0;
+
+  /// Gets the current position of this reader within the input data.
+  virtual int get_posn ()
+  {
+    return posn;
+  }
+
+  /// Sets the position of this reader within the input data, if possible.
+  virtual void set_posn (int a_posn)
+  {
+    if (a_posn >= 0 && a_posn <= max_posn)
+      {
+	posn = a_posn;
+	curr = begin + posn;
+      }
+  }
 };
 
 // Assume the input is utf-8.
Index: source/ucs2.cc
===================================================================
--- source/ucs2.cc	2005-10-02 15:33:22.000000000 +0530
+++ source/ucs2.cc	2005-10-02 20:18:56.000000000 +0530
@@ -1,6 +1,6 @@
 // Implementation of readers.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -29,8 +29,11 @@
 
 ucs2_reader::ucs2_reader (byte_buffer *in)
   : input (in),
+    begin (in->get ()),
     curr (in->get ()),
-    limit (in->get () + in->get_length ())
+    end (in->get () + in->get_length ()),
+    posn (0),
+    max_posn (in->get_length ())
 {
 }
 
Index: source/iconv.cc
===================================================================
--- source/iconv.cc	2005-10-02 15:35:16.000000000 +0530
+++ source/iconv.cc	2005-10-02 15:59:56.000000000 +0530
@@ -1,6 +1,6 @@
 // iconv-based reader.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -62,7 +62,7 @@ iconv_ucs2_reader::refill ()
 {
   assert (last == next);
 
-  size_t in_rem = limit - curr;
+  size_t in_rem = end - curr;
   char *out_loc = (char *) translated;
   size_t out_rem = sizeof (translated);
 
@@ -99,7 +99,7 @@ iconv_ucs2_reader::get ()
 {
   if (next == last)
     {
-      if (curr == limit)
+      if (curr == end)
 	return UNICODE_EOF;
       refill ();
       assert (next != last);
Index: source/lex.hh
===================================================================
--- source/lex.hh	2005-10-02 16:11:59.000000000 +0530
+++ source/lex.hh	2005-10-02 19:48:13.000000000 +0530
@@ -163,6 +163,11 @@ public:
 
   // Return the name of a token.
   static const char *token_to_string (token_value);
+
+  /// Return the line at the given position, counting from 1.  The characters
+  /// are encoded using UTF-8.  Returns NULL if the line number is out of
+  /// range.
+  const char *get_line (int line_number);
 };
 
 #endif // GCJX_SOURCE_LEX_HH
Index: source/lex.cc
===================================================================
--- source/lex.cc	2005-10-02 16:15:10.000000000 +0530
+++ source/lex.cc	2005-10-02 20:17:41.000000000 +0530
@@ -1315,6 +1315,60 @@ lexer::get_token ()
   return result;
 }
 
+const char *
+lexer::get_line (int line_number)
+{
+  // Save the relevant state of the lexer and the input reader before we
+  // rewind the input reader.
+  int save_line = line;
+  int save_column = column;
+  int save_backslash_count = backslash_count;
+  unicode_w_t save_unget_value = unget_value;
+  unicode_w_t save_cooked_unget_value = cooked_unget_value;
+  bool save_was_return = was_return;
+  int save_posn = input_filter->get_posn ();
+
+  // Reset the state of the lexer and the input reader.
+  line = 1;
+  column = 0;
+  backslash_count = 0;
+  unget_value = UNICODE_W_NONE;
+  cooked_unget_value = UNICODE_W_NONE;
+  was_return = false;
+  input_filter->set_posn (0);
+
+  // Ignore input until we reach the desired line (or end of file).
+  unicode_w_t c = UNICODE_W_NONE;
+  while (line < line_number && (c = get ()) != UNICODE_EOF)
+    ;
+
+  // Read in the characters on the desired line.
+  char *ret_val = NULL; 
+  if (c != UNICODE_EOF)
+    {
+      int offset = 0;
+      do
+        {
+          offset = scratch_insert_utf8 (offset, c);
+        } while ((c = get ()) != UNICODE_EOF && line == line_number);
+
+      ensure (offset + 1);
+      scratch[offset] = '\0';
+      ret_val = scratch;
+    }
+
+  // Restore the state of the lexer and the input reader.
+  line = save_line;
+  column = save_column;
+  backslash_count = save_backslash_count;
+  unget_value = save_unget_value;
+  cooked_unget_value = save_cooked_unget_value;
+  was_return = save_was_return;
+  input_filter->set_posn (save_posn);
+
+  return ret_val;
+}
+
 
 
 const char *
Index: location.hh
===================================================================
--- location.hh	2005-10-02 18:20:41.000000000 +0530
+++ location.hh	2005-10-02 18:21:19.000000000 +0530
@@ -1,6 +1,6 @@
 // Represent a location.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -62,6 +62,11 @@ public:
   {
     return line;
   }
+
+  short get_column () const
+  {
+    return column;
+  }
 };
 
 extern location LOCATION_UNKNOWN;
Index: format/format.hh
===================================================================
--- format/format.hh	2005-10-02 17:51:33.000000000 +0530
+++ format/format.hh	2005-10-02 18:18:29.000000000 +0530
@@ -1,6 +1,6 @@
 // Formatting.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -24,6 +24,7 @@
 
 #include "location.hh"
 #include "owner.hh"
+#include "source/lex.hh"
 
 typedef enum
 {
@@ -55,6 +56,9 @@ class format_repr
   /// Location of this error.
   location where;
 
+  /// Lexer to use to retrieve lines from source files.
+  lexer *src_lexer;
+
   /// The formatting string passed in by the user.
   const char *plan;
 
@@ -106,6 +110,11 @@ public:
   {
     where = w;
   }
+
+  void set_lexer (lexer *a_lexer)
+  {
+    src_lexer = a_lexer;
+  }
 };
 
 typedef owner<format_repr> format;
Index: format/format.cc
===================================================================
--- format/format.cc	2005-10-02 17:53:43.000000000 +0530
+++ format/format.cc	2005-10-02 20:22:50.000000000 +0530
@@ -1,6 +1,6 @@
 // Formatting.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -25,6 +25,7 @@
 format_repr::format_repr (format_type t, location w, const char *fmt)
   : refc (0),
     where (w),
+    src_lexer (NULL),
     plan (copy_str (fmt)),
     subst_count (0),
     type (t)
@@ -66,6 +67,24 @@ format_repr::dump (std::ostream &os) con
 		  || (type == format_warning
 		      && global->get_compiler ()->warnings_are_errors ()));
   os << where << (failure ? "error: " : "warning: ") << get_message () << "\n";
+
+  // If possible, print out the corresponding line from the source file.
+  if (src_lexer)
+    {
+      const char *line = src_lexer->get_line (where.get_line ());
+      if (line)
+        {
+          // FIXME: Assumes a terminal that can directly show UTF-8.
+          os << line << std::endl;
+
+          // Now print a caret underneath the column in question.
+          int col = where.get_column ();
+          for (int i = 0; i < col; i++)
+            os << ' ';
+          os << '^' << std::endl << std::endl;
+        }
+    }
+
   if (failure)
     global->get_compiler ()->set_failed ();
 }
Index: exception.hh
===================================================================
--- exception.hh	2005-10-02 17:46:18.000000000 +0530
+++ exception.hh	2005-10-02 18:18:05.000000000 +0530
@@ -1,6 +1,6 @@
 // Exceptions thrown by lexer, parser, etc.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -23,6 +23,7 @@
 #define GCJX_EXCEPTION_HH
 
 #include "format/format.hh"
+#include "source/lex.hh"
 
 class exception_base
 {
@@ -47,6 +48,12 @@ public:
     formatter->set_location (w);
   }
 
+  /// Set the lexer to use to retrieve lines from source files.
+  void set_lexer (lexer *src_lexer)
+  {
+    formatter->set_lexer (src_lexer);
+  }
+
   friend std::ostream &operator<< (std::ostream &, const exception_base &);
 };
 
Index: model/unit.hh
===================================================================
--- model/unit.hh	2005-10-02 18:36:20.000000000 +0530
+++ model/unit.hh	2005-10-02 20:40:03.000000000 +0530
@@ -1,6 +1,6 @@
 // Represent a compilation unit.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -22,6 +22,8 @@
 #ifndef GCJX_MODEL_UNIT_HH
 #define GCJX_MODEL_UNIT_HH
 
+#include "source/lex.hh"
+
 /// This is the base class of all compilation units.
 class model_unit : public model_element, public IScope
 {
@@ -38,18 +40,28 @@ protected:
   // File name associated with this compilation unit.
   std::string filename;
 
+  // Lexer for this compilation unit.
+  lexer *src_lexer;
+
   // True if we've been resolved.
   bool resolved;
 
   model_unit (const location &w)
     : model_element (w),
       package (NULL),
+      src_lexer (NULL),
       resolved (false)
   {
   }
 
 public:
 
+  virtual ~model_unit ()
+  {
+    if (src_lexer)
+      delete src_lexer;
+  }
+
   void set_package (model_package *p)
   {
     package = p;
@@ -70,6 +82,16 @@ public:
     return filename;
   }
 
+  void set_lexer (lexer *a_lexer)
+  {
+    src_lexer = a_lexer;
+  }
+
+  lexer *get_lexer () const
+  {
+    return src_lexer;
+  }
+
   void add (const ref_class &typ)
   {
     types.push_back (typ);
Index: reader/source.cc
===================================================================
--- reader/source.cc	2005-10-02 18:07:25.000000000 +0530
+++ reader/source.cc	2005-10-02 19:33:10.000000000 +0530
@@ -1,6 +1,6 @@
 // Read .java files from a byte stream.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -49,10 +49,12 @@ source_file_creator::apply (bool emit)
     {
       ref_unit unit = p.compilation_unit ();
       unit->set_file_name (file);
+      unit->set_lexer (ts);
       global->get_compiler ()->add_unit (unit, emit);
     }
   catch (exception_base &exc)
     {
+      exc.set_lexer (ts);
       std::cerr << exc;
     }
   // fixme propagate the exceptions on up?
Index: source/parse.hh
===================================================================
--- source/parse.hh	2005-10-02 20:01:06.000000000 +0530
+++ source/parse.hh	2005-10-02 20:32:11.000000000 +0530
@@ -1,6 +1,6 @@
 // The parser.
 
-// Copyright (C) 2004 Free Software Foundation, Inc.
+// Copyright (C) 2004, 2005 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
@@ -234,7 +234,6 @@ public:
 
   ~parse ()
   {
-    delete our_token_stream;
   }
 
   // Compile a file and return a compilation unit.  This may throw
Index: typedefs.hh
===================================================================
--- typedefs.hh	2005-10-02 17:56:49.000000000 +0530
+++ typedefs.hh	2005-10-02 18:05:15.000000000 +0530
@@ -85,6 +85,20 @@ typedef unsigned int unicode_w_t;
 // The special EOF value.
 #define UNICODE_EOF ((unicode_w_t) -1)
 
+// Some integer extrema.
+#define MIN_INT  -0x80000000LL
+#define MAX_INT   0x7fffffffLL
+#define MIN_LONG -0x8000000000000000LL
+#define MAX_LONG  0x7fffffffffffffffLL
+
+#include "util.hh"
+
+#include "owner.hh"
+
+// Forward-declaration for use by the lexer headers.
+class model_element;
+typedef owner<model_element> ref_element;
+
 /// This is a list of possible warning states.
 typedef enum
   {
@@ -95,15 +109,6 @@ typedef enum
 
 #include "exception.hh"
 
-// Some integer extrema.
-#define MIN_INT  -0x80000000LL
-#define MAX_INT   0x7fffffffLL
-#define MIN_LONG -0x8000000000000000LL
-#define MAX_LONG  0x7fffffffffffffffLL
-
-#include "util.hh"
-
-#include "owner.hh"
 #include "global.hh"
 #include "factory.hh"
 #include "model/iscope.hh"
@@ -132,7 +137,6 @@ class visitor;
 #include "model/value.hh"
 
 #include "model/element.hh"
-typedef owner<model_element> ref_element;
 
 #include "model/modifier.hh"
 typedef owner<model_modifier_list> ref_modifier_list;
Index: compiler.cc
===================================================================
--- compiler.cc	2005-10-02 18:40:59.000000000 +0530
+++ compiler.cc	2005-10-02 18:41:44.000000000 +0530
@@ -641,6 +641,7 @@ compiler::do_analyze_unit (model_unit *u
 	}
       catch (exception_base &exc)
 	{
+          exc.set_lexer (unit->get_lexer ());
 	  std::cerr << exc;
 	  ok = false;
 	}

Follow-Ups:
- Re: [gcjx] RFC/RFA: Show Problematic Source Lines with Diagnostics
  - From: Ranjit Mathew
- Re: [gcjx] RFC/RFA: Show Problematic Source Lines with Diagnostics
  - From: Tom Tromey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]