This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

compacting _Jv_Utf8Const


_Jv_Utf8Const names take up a fair amount of space. (However, I don't have numbers on this. Does anyone?) Some of it is "overhead": the length (2 bytes), hash code (2 bytes), final '\0' (1 byte), and alignment (0-1 bytes). How about this more compact encoding:

struct _Jv_Utf8Const
{
  unsigned char hash;
  /* The data length is split into 7-bit chunks.  The chunks appear in
   * low-endian order (because that is easier to generate), with the
   * final chunk in a byte with a 0 high-order bit, while the preceding
   * ones have the high-order bit set. */
  /* unsigned char length[?]; -extra low-order 7-bit chunks as needed */
  unsigned char length0;  /* high-order byte of length of data, in bytes */
  char data[1];		/* In Utf8 format; no final '\0'. */
};

The hash code is reduced to 1 byte, saving one byte, under the assumption that clashes will be rare. We reduce the length field to a single byte in all normal cases, saving another byte. We get rid of the final useless '\0', saving a third byte. And we remove the requirement for short-alignment, saving on average half a byte.

So the savings would be 3.5 bytes per name. Is that enough to be worth while?

We also also removing the restriction to maximum 0xFFFF bytes.

Disadvantages: Slightly slower comparisons. More complex code. More awkward to print out _Jv_Utf8Const from gdb. Broking binary compatibility. But the biggest is the actual work of changing the code.

There is also the issue whether this change is compatible with the plans for new ABI.

Finally, one could compress the actual characters, i.e. use a more compact special-purpose encoding than UTF8. 6 bits per characters, with some escape mechanism, should be enough, but saving 25% is probably not enough to justify the complexity.
--
--Per Bothner
per@bothner.com http://per.bothner.com/



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]