This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
compacting _Jv_Utf8Const
- From: Per Bothner <per at bothner dot com>
- To: java at gcc dot gnu dot org
- Date: Wed, 05 May 2004 12:23:22 -0700
- Subject: compacting _Jv_Utf8Const
_Jv_Utf8Const names take up a fair amount of space. (However, I don't
have numbers on this. Does anyone?) Some of it is "overhead": the
length (2 bytes), hash code (2 bytes), final '\0' (1 byte), and
alignment (0-1 bytes). How about this more compact encoding:
struct _Jv_Utf8Const
{
unsigned char hash;
/* The data length is split into 7-bit chunks. The chunks appear in
* low-endian order (because that is easier to generate), with the
* final chunk in a byte with a 0 high-order bit, while the preceding
* ones have the high-order bit set. */
/* unsigned char length[?]; -extra low-order 7-bit chunks as needed */
unsigned char length0; /* high-order byte of length of data, in bytes */
char data[1]; /* In Utf8 format; no final '\0'. */
};
The hash code is reduced to 1 byte, saving one byte, under the
assumption that clashes will be rare. We reduce the length field to a
single byte in all normal cases, saving another byte. We get rid of the
final useless '\0', saving a third byte. And we remove the requirement
for short-alignment, saving on average half a byte.
So the savings would be 3.5 bytes per name. Is that enough to be worth
while?
We also also removing the restriction to maximum 0xFFFF bytes.
Disadvantages: Slightly slower comparisons. More complex code. More
awkward to print out _Jv_Utf8Const from gdb. Broking binary
compatibility. But the biggest is the actual work of changing the code.
There is also the issue whether this change is compatible with the plans
for new ABI.
Finally, one could compress the actual characters, i.e. use a more
compact special-purpose encoding than UTF8. 6 bits per characters, with
some escape mechanism, should be enough, but saving 25% is probably not
enough to justify the complexity.
--
--Per Bothner
per@bothner.com http://per.bothner.com/