This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
webpage: update cpplib todo list, add gcc-micro.html
- To: gcc-patches at gcc dot gnu dot org
- Subject: webpage: update cpplib todo list, add gcc-micro.html
- From: Zack Weinberg <zack at wolery dot cumb dot org>
- Date: Fri, 28 Jan 2000 23:47:40 -0800
This patch has a complete rewrite of the cpplib to-do list and adds my
list of optimizer problems to the official website.
zw
===================================================================
Index: projects.html
--- projects.html 1999/09/20 07:41:33 1.7
+++ projects.html 2000/01/29 07:45:11
@@ -13,6 +13,10 @@ very much anymore, but who knows?
<p>There is a separate project list for the <a
href="proj-cpplib.html">C preprocessor</a>.
+<p>We also have a page detailing <a href="gcc-micro.html">optimizer
+inadequacies</a>, if you'd prefer to think about it in terms of problems
+instead of features.
+
<h2>Changes to support C99 draft</h2>
<p>The (not yet published) next revision of the C standard requires a
===================================================================
Index: gcc-micro.html
--- /dev/null Tue May 5 13:32:27 1998
+++ gcc-micro.html Fri Jan 28 23:45:11 2000
@@ -0,0 +1,822 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+ "http://www.w3.org/TR/html4/loose.dtd">
+<html><head>
+<title>Micro-optimizations</title>
+<link rev="made" href="mailto:zack@wolery.cumb.org">
+</head>
+
+<body bgcolor="white" text="black" link="#0000EE" vlink="#551A8B" alink="red">
+<h1 align="center">Micro-optimizations that GCC should perform</h1>
+
+<p>This page lists places where GCC's code generation is suboptimal and
+the problem can be shown in a few lines of assembly output, hence
+"micro-optimizations." I'll be updating it as I notice new issues.
+Please send suggestions to <a
+href="mailto:zack@wolery.cumb.org">zack@wolery.cumb.org</a>.
+
+<p>Note: unless otherwise specified, all examples have been compiled
+with the current CVS tree as of the date of the example, on x86, with
+<code>-O2 -fomit-frame-pointer -fschedule-insns</code>. (The x86 back
+end disables <code>-fschedule-insns</code>, which is something that
+should be revisited, because it always gives better code when I turn
+it back on.)
+
+<p><strong>Contents:</strong>
+<ol>
+<li><a href="#invert">Inverting conditionals</a>
+<li><a href="#csefail">Failure of common subexpression elimination</a>
+<li><a href="#storemerge">Store merging</a>
+<li><a href="#gcsereg">Global CSE and hard registers</a>
+<li><a href="#volatile">Volatile inhibits too many optimizations</a>
+<li><a href="#rndmode">Unnecessary changes of rounding mode</a>
+<li><a href="#regshuf">Register shuffling and <code>long long</code></a>
+<li><a href="#fpmove">Moving floating point through integer registers</a>
+</ol>
+
+<hr>
+<h2><a name="invert">Inverting conditionals</a></h2>
+
+<p>(14 Jan 2000) Frequently GCC produces better code if you write a
+conditional one way than if you write it the opposite way. Here is a
+simple example.
+
+<p><pre>
+static const unsigned char
+trigraph_map[] = {
+ '|', 0, 0, 0, 0, 0, '^',
+ '[', ']', 0, 0, 0, '~',
+ 0, '\\', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, '{', '#', '}'
+};
+
+unsigned char
+map1 (c)
+ unsigned char c;
+{
+ if (c >= '!' && c <= '>')
+ return trigraph_map[c - '!'];
+ return 0;
+}
+
+unsigned char
+map2 (c)
+ unsigned char c;
+{
+ if (c < '!' || c > '>')
+ return 0;
+ return trigraph_map[c - '!'];
+}
+</pre>
+
+<p>Assembly output for <code>map1</code> and <code>map2</code> is,
+surprisingly, different:
+
+<p><pre>
+map1:
+ movb 4(%esp), %cl
+ xorl %eax, %eax
+ movb %cl, %dl
+ subb $33, %dl
+ cmpb $29, %dl
+ ja .L4
+ movzbl %cl, %eax
+ movzbl trigraph_map-33(%eax), %eax
+.L4:
+ ret
+
+map2:
+ movb 4(%esp), %cl
+ xorl %eax, %eax
+ movb %cl, %dl
+ subb $33, %dl
+ cmpb $29, %dl
+ ja .L7
+ movzbl %cl, %eax
+ movzbl trigraph_map-33(%eax), %eax
+ ret
+ .p2align 4,,7
+.L7:
+ ret
+</pre>
+
+<p>Admittedly, the difference is small - a redundant <code>'ret'</code>
+instruction and a padding directive, and six bytes wasted in the
+object file. The problem is worse for larger blocks of conditional
+code, though.
+
+<hr>
+<h2><a name="csefail">Failure of common subexpression elimination</a></h2>
+
+<p>(14 Jan 2000) The same code also illustrates a failing in CSE.
+Once again, the source is
+
+<p><pre>
+unsigned char
+map1 (c)
+ unsigned char c;
+{
+ if (c >= '!' && c <= '>')
+ return trigraph_map[c - '!'];
+ return 0;
+}
+</pre>
+
+<p>and the assembly is
+
+<p><pre>
+map1:
+ movb 4(%esp), %cl
+ xorl %eax, %eax
+ movb %cl, %dl
+ subb $33, %dl
+ cmpb $29, %dl
+ ja .L4
+ movzbl %cl, %eax
+ movzbl trigraph_map-33(%eax), %eax
+.L4:
+ ret
+</pre>
+
+<p>If we were writing this code by hand, we would do it thus:
+
+<p><pre>
+map1:
+ movb 4(%esp), %cl
+ xorl %eax, %eax
+ subb $33, %cl
+ cmpb $29, %cl
+ ja .L4
+ movzbl %cl, %eax
+ movzbl trigraph_map(%eax), %eax
+.L4:
+ ret
+</pre>
+
+<p>This does not save a runtime subtract - <code>trigraph_map-33</code>
+happens at load time. It does, however, save a register, which would
+be important if this function were to be inlined. It also puts the
+<code>'ret'</code> instruction at the alignment the processor likes for
+jump targets, which is important because we happen to know that the
+jump will almost always be taken.
+
+<p>Some marginally more detailed analysis: Local CSE can't help because
+the two subtracts are in different basic blocks. Global CSE does not
+merge the subtracts because they appear to occur in different modes.
+We have RTL like so:
+
+<p><pre>
+(insn 13 7 14 (parallel[
+ (set (reg:QI 27)
+ (plus:QI (reg/v:QI 25)
+ (const_int -33 [0xffffffdf])))
+ (clobber (reg:CC 17 flags))
+ ] ) 183 {*addqi_1} (nil)
+ (nil))
+
+;; ...
+
+(insn 17 44 19 (parallel[
+ (set (reg:SI 29)
+ (zero_extend:SI (reg/v:QI 25)))
+ (clobber (reg:CC 17 flags))
+ ] ) 106 {*zero_extendqisi2_movzbw_and} (nil)
+ (nil))
+
+(insn 19 17 21 (parallel[
+ (set (reg:SI 30)
+ (plus:SI (reg:SI 29)
+ (const_int -33 [0xffffffdf])))
+ (clobber (reg:CC 17 flags))
+ ] ) 174 {*addsi_1} (nil)
+ (nil))
+</pre>
+
+<p>I suspect that this is conservatism on the part of the optimizer. It
+might be that doing the zero_extend and then the subtract would have a
+different result than the other way around. However, we know that
+this cannot be the case, because control will never reach insn 17
+unless (reg:QI 25) is greater than 33.
+
+<hr>
+<h2><a name="storemerge">Store merging</a></h2>
+
+<p>(14 Jan 2000) GCC frequently generates multiple narrow writes to
+adjacent memory locations. Memory writes are expensive; it would be
+better if they were combined. For example:
+
+<p><pre>
+struct rtx_def
+{
+ unsigned short code;
+ int mode : 8;
+ unsigned int jump : 1;
+ unsigned int call : 1;
+ unsigned int unchanging : 1;
+ unsigned int volatil : 1;
+ unsigned int in_struct : 1;
+ unsigned int used : 1;
+ unsigned integrated : 1;
+ unsigned frame_related : 1;
+};
+
+void
+i1(struct rtx_def *d)
+{
+ memset((char *)d, 0, sizeof(struct rtx_def));
+ d->code = 12;
+ d->mode = 23;
+}
+
+void
+i2(struct rtx_def *d)
+{
+ d->code = 12;
+ d->mode = 23;
+
+ d->jump = d->call = d->unchanging = d->volatil
+ = d->in_struct = d->used = d->integrated = d->frame_related = 0;
+}
+</pre>
+
+<p>compiles to (I have converted the constants to hexadecimal to make the
+situation clearer):
+
+<p><pre>
+i1:
+ movl 4(%esp), %eax
+ movl $0x0, (%eax)
+ movb $0x17, 2(%eax)
+ movw $0x0c, (%eax)
+ ret
+
+i2:
+ movl 4(%esp), %eax
+ movb $0x0, 3(%eax)
+ movw $0x0c, (%eax)
+ movb $0x17, 2(%eax)
+ ret
+</pre>
+
+<p>Both versions ought to compile to
+
+<p><pre>
+i3:
+ movl 4(%esp), %eax
+ movl $0x17000c, (%eax)
+ ret
+</pre>
+
+<p>Other architectures <em>have</em> to do this optimization, so GCC is
+capable of it. GCC simply needs to be taught that it's a win on this
+architecture too. It might be nice if it would do the same thing for
+a more general function where the values assigned to
+<code>'code'</code> and <code>'mode'</code> were not constant, but the
+advantage is less obvious here.
+
+<hr>
+<h2><a name="gcsereg">Global CSE and hard registers</a></h2>
+
+<p>(16 Jan 2000) Global CSE is not capable of operating on hard
+registers. This causes it to miss obvious optimizations. For
+example, consider this C++ fragment:
+
+<p><pre>
+struct A
+{
+ A (int);
+};
+
+struct B : virtual public A
+{
+ B ();
+};
+
+B::B ()
+ : A (3)
+{
+}
+</pre>
+
+<p>This compiles as follows (exception handling labels edited out for
+clarity):
+
+<p><pre>
+__1Bi:
+ subl $24, %esp
+ pushl %ebx
+ movl 36(%esp), %edx
+ movl 32(%esp), %ebx
+ testl %edx, %edx
+ je .L3
+ leal 4(%ebx), %eax
+ movl %eax, (%ebx)
+.L3:
+ testl %edx, %edx
+ je .L4
+ subl $8, %esp
+ leal 4(%ebx), %eax
+ pushl $3
+ pushl %eax
+ call __1Ai
+ addl $16, %esp
+.L4:
+ movl %ebx, %eax
+ popl %ebx
+ addl $24, %esp
+ ret
+</pre>
+
+<p>Notice how the test of <code>%edx</code> and the load of
+<code>%eax</code> both occur twice. We would like code more like this
+to be generated:
+
+<p><pre>
+__1Bi:
+ subl $24, %esp
+ pushl %ebx
+ movl 36(%esp), %edx
+ movl 32(%esp), %ebx
+ testl %edx, %edx
+ je .L4
+ leal 4(%ebx), %eax
+ movl %eax, (%ebx)
+ subl $8, %esp
+ pushl $3
+ pushl %eax
+ call __1Ai
+ addl $16, %esp
+.L4:
+ movl %ebx, %eax
+ popl %ebx
+ addl $24, %esp
+ ret
+</pre>
+
+<p>This is also a decent example of stack space wastage. The i386
+architecture wants 16-byte stack alignment right before every call
+instruction, and we try to align doubles on the stack as well.
+However, none of the variables in this function need more than 4 byte
+alignment, and there's no reason to keep the stack pointer aligned in
+the middle of the function. All the same constraints are satisfied by
+this version:
+
+<p><pre>
+__1Bi:
+ pushl %ebx
+ movl 12(%esp), %edx
+ movl 8(%esp), %ebx
+ testl %edx, %edx
+ je .L4
+ leal 4(%ebx), %eax
+ movl %eax, (%ebx)
+ pushl $3
+ pushl %eax
+ call __1Ai
+ addl $8, %esp
+.L4:
+ movl %ebx, %eax
+ popl %ebx
+ ret
+</pre>
+
+<p>Only part of the problem is with alignment. The other part is that
+stack slots are frequently allocated for variables that wound up in
+registers.
+
+<hr>
+<h2><a name="volatile">Volatile inhibits too many optimizations</a></h2>
+
+<p>(17 Jan 2000) gcc refuses to perform in-memory operations on
+volatile variables, on architectures that have those operations.
+Compare:
+
+<p><pre>
+extern int a;
+extern volatile int b;
+
+void inca(void) { a++; }
+
+void incb(void) { b++; }
+</pre>
+
+<p>compiles to:
+
+<p><pre>
+inca:
+ incl a
+ ret
+
+incb:
+ movl b, %eax
+ incl %eax
+ movl %eax, b
+ ret
+</pre>
+
+<p>Note that this is a policy decision. Changing the behavior is
+trivial - permit <code>general_operand</code> to accept volatile
+variables. To date the GCC team has chosen not to do so.
+
+<p>The C standard is maddeningly ambiguous about the semantics of
+volatile variables. It <em>happens</em> that on x86 the two
+functions above have identical semantics. On other platforms that
+have in-memory operations, that may not be the case, and the C
+standard may take issue with the difference - we aren't sure.
+
+<hr>
+<h2><a name="rndmode">Unnecessary changes of rounding mode</a></h2>
+
+<p>(17 Jan 2000) gcc does not remember the state of the floating point
+control register, so it changes it more than necessary. Consider the
+following:
+
+<p><pre>
+void
+d2i2(const double a, const double b, int * const i, int * const j)
+{
+ *i = a;
+ *j = b;
+}
+</pre>
+
+<p>This performs two conversions from <code>'double'</code> to
+<code>'int'</code>. The example compiles as follows:
+
+<p><pre>
+d2i2:
+ subl $24, %esp
+ pushl %ebx
+ movl 48(%esp), %edx
+ movl 52(%esp), %ecx
+ fldl 32(%esp)
+ fldl 40(%esp)
+ fxch %st(1)
+ fnstcw 12(%esp)
+ movl 12(%esp), %ebx
+ movb $12, 13(%esp)
+ fldcw 12(%esp)
+ movl %ebx, 12(%esp)
+ fistpl 8(%esp)
+ fldcw 12(%esp)
+ movl 8(%esp), %eax
+ movl %eax, (%edx)
+ fnstcw 12(%esp)
+ movl 12(%esp), %edx
+ movb $12, 13(%esp)
+ fldcw 12(%esp)
+ movl %edx, 12(%esp)
+ fistpl 8(%esp)
+ fldcw 12(%esp)
+ movl 8(%esp), %eax
+ movl %eax, (%ecx)
+ popl %ebx
+ addl $24, %esp
+ ret
+</pre>
+
+<p>For those who are unfamiliar with the, um, unique design of the x86
+floating point unit, it has an eight-slot stack and each entry holds a
+value in an extended format. Values can be moved between top-of-stack
+and memory, but cannot be moved between top-of-stack and the integer
+registers. The control word, which is a separate value, cannot be
+moved to or from the integer registers either.
+
+<p>On x86, converting a <code>'double'</code> to <code>'int'</code>,
+when <code>'double'</code> is in 64-bit IEEE format, requires setting
+the control word to a nonstandard value. In the code above, you can
+clearly see that the control word is saved, changed, and restored
+around each individual conversion. It would be perfectly possible to
+do it only once, thus:
+
+<p><pre>
+d2i2:
+ subl $24, %esp
+ pushl %ebx
+ movl 48(%esp), %edx
+ movl 52(%esp), %ecx
+ fldl 32(%esp)
+ fldl 40(%esp)
+ fxch %st(1)
+ fnstcw 12(%esp)
+ movl 12(%esp), %ebx
+ movb $12, 13(%esp)
+ fldcw 12(%esp)
+ movl %ebx, 12(%esp)
+ fistpl 8(%esp)
+ movl 8(%esp), %eax
+ movl %eax, (%edx)
+ fistpl 8(%esp)
+ fldcw 12(%esp)
+ movl 8(%esp), %eax
+ movl %eax, (%ecx)
+ popl %ebx
+ addl $24, %esp
+ ret
+</pre>
+
+<p>Other obvious improvements in this code include storing directly
+from the floating-point stack to the target addresses, and reordering
+the loads to avoid the <code>'fxch'</code> instruction. You can't
+reorder the stores in C because <code>'i'</code> and <code>'j'</code>
+might point at the same location.
+
+<p><pre>
+d2i2:
+ subl $24, %esp
+ pushl %ebx
+ movl 48(%esp), %edx
+ movl 52(%esp), %ecx
+ fldl 40(%esp)
+ fldl 32(%esp)
+ fnstcw 12(%esp)
+ movl 12(%esp), %ebx
+ movb $12, 13(%esp)
+ fldcw 12(%esp)
+ movl %ebx, 12(%esp)
+ fistpl (%edx)
+ fistpl (%ecx)
+ fldcw 12(%esp)
+ popl %ebx
+ addl $24, %esp
+ ret
+</pre>
+
+<p>As usual, we can also reduce the amount of wasted stack space:
+
+<p><pre>
+d2i2:
+ pushl %ebx
+ movl 24(%esp), %edx
+ movl 28(%esp), %ecx
+ fldl 16(%esp)
+ fldl 8(%esp)
+ fnstcw 24(%esp)
+ movl 24(%esp), %ebx
+ movb $12, 25(%esp)
+ fldcw 24(%esp)
+ fistpl (%edx)
+ fistpl (%ecx)
+ movl %ebx, 24(%esp)
+ fldcw 24(%esp)
+ popl %ebx
+ ret
+</pre>
+
+<p>This version recycles the stack slot of one of the parameters as
+temporary storage for the control word.
+
+<p>The four versions of this routine occupy respectively 97, 72, 54,
+and 48 bytes of text. Version 2 will be dramatically faster than
+version 1; 3 will be somewhat faster than 2, and 4 will be about the
+same as 3, but will waste less memory.
+
+<hr>
+<h2><a name="regshuf">Register shuffling and <code>long long</code></a></h2>
+
+<p>(22 Jan 2000) GCC has a number of problems doing 64-bit arithmetic
+on architectures with 32-bit words. This is only one of the most
+obvious issues.
+
+<p><pre>
+extern void big(long long u);
+void doit(unsigned int a, unsigned int b, char *id)
+{
+ big(*id);
+ big(a);
+ big(b);
+}
+</pre>
+
+<p>compiles to:
+
+<p><pre>
+doit:
+ subl $20, %esp
+ pushl %esi
+ pushl %ebx
+ movl 40(%esp), %ecx
+ subl $8, %esp
+ movl 40(%esp), %ebx
+ movl 44(%esp), %esi
+ movsbl (%ecx), %eax
+ cltd
+* pushl %edx
+* pushl %eax
+ call big
+ subl $8, %esp
+ xorl %edx, %edx
+ movl %ebx, %eax
+* pushl %edx
+* pushl %eax
+ call big
+ addl $24, %esp
+ xorl %edx, %edx
+ movl %esi, %eax
+* pushl %edx
+* pushl %eax
+ call big
+ addl $16, %esp
+ popl %ebx
+ popl %esi
+ addl $20, %esp
+ ret
+</pre>
+
+<p>Notice how the argument to <code>big</code> is invariably shuffled
+such that its high word is in <code>%edx</code> and its low word is in
+<code>%eax</code>, and then pushed. This is because gcc is incapable
+of manipulating the two halves separately. It should be able to
+generate code like this:
+
+<p><pre>
+doit:
+ subl $20, %esp
+ pushl %esi
+ pushl %ebx
+ movl 40(%esp), %ecx
+ subl $8, %esp
+ movl 40(%esp), %ebx
+ movl 44(%esp), %esi
+ movsbl (%ecx), %eax
+ cltd
+ pushl %edx
+ pushl %eax
+ call big
+ subl $8, %esp
+ xorl %edx, %edx
+ pushl %edx
+ pushl %ebx
+ call big
+ addl $24, %esp
+ xorl %edx, %edx
+ pushl %edx
+ pushl %esi
+ call big
+ addl $16, %esp
+ popl %ebx
+ popl %esi
+ addl $20, %esp
+ ret
+</pre>
+
+<p>Also, the choice to fetch all arguments from the stack at the very
+beginning is questionable. It might be better to use one callee-save
+register to hold zero and retrieve args from the stack when needed.
+This, with the usual tweaks to stack adjusts, makes the code much
+shorter.
+
+<p><pre>
+doit:
+ pushl %ebx
+ xorl %ebx, %ebx
+ movl 8(%esp), %ecx
+ movsbl (%ecx), %eax
+ cltd
+ pushl %edx
+ pushl %eax
+ call big
+ addl $8, %esp
+ movl 12(%esp), %eax
+ pushl %ebx
+ pushl %eax
+ call big
+ addl $8, %esp
+ movl 16(%esp), %eax
+ pushl %ebx
+ pushl %eax
+ call big
+ addl $8, %esp
+ popl %ebx
+ ret
+</pre>
+
+<hr>
+<h2><a name="fpmove">Moving floating point through integer registers</a></h2>
+
+<p>(22 Jan 2000) GCC 2.96 on x86 knows how to move <code>float</code>
+quantities using integer instructions. This is normally a win because
+floating point moves take more cycles. However, it increases the
+pressure on the minuscule integer register file and therefore can end
+up making things worse.
+
+<p><pre>
+void
+fcpy(float *a, float *b, float *aa, float *bb, int n)
+{
+ int i;
+ for(i = 0; i < n; i++) {
+ aa[i]=a[i];
+ bb[i]=b[i];
+ }
+}
+</pre>
+
+<p>I've compiled this three times and present the results side by
+side. Only the inner loop is shown.
+
+<p><pre>
+ 2.95 @ -O2 2.96 @ -O2 2.96 @ -O2 -fomit-fp
+ .L6: .L6: .L6:
+ movl 8(%ebp), %ebx
+ flds (%edi,%eax,4) movl (%ebx,%edx,4), %eax movl (%ebp,%edx,4), %eax
+ fstps (%ebx,%eax,4) movl %eax, (%esi,%edx,4) movl %eax, (%esi,%edx,4)
+ movl 20(%ebp), %ebx
+ flds (%esi,%eax,4) movl (%edi,%edx,4), %eax movl (%edi,%edx,4), %eax
+ fstps (%ecx,%eax,4) movl %eax, (%ebx,%edx,4) movl %eax, (%ebx,%edx,4)
+ incl %eax incl %edx incl %edx
+ cmpl %edx,%eax cmpl %ecx, %edx cmpl %ecx, %edx
+ jl .L6 jl .L6 jl .L6
+</pre>
+
+<p>The loop requires seven registers: four base pointers, an index, a
+limit, and a scratch. All but the scratch must be integer. The x86
+has only six integer registers under normal conditions. gcc 2.95 uses
+a float register for the scratch, so the loop just fits. 2.96 tries
+to use an integer register, and has to spill two pointers onto the
+stack to make everything fit. Adding <code>-fomit-frame-pointer</code>
+ makes a seventh integer register available, and the loop fits again.
+
+<p>We see here numerous optimizer idiocies. First, it ought to
+recognize that a load - even from L1 cache - is more expensive than a
+floating point move, and go back to the FP registers. Second, instead
+of spilling the pointers, it should spill the limit register. The
+limit is only used once and the <code>'cmpl'</code> instruction can
+take a memory operand. Third, the loop optimizer has failed to do
+anything at all. It should rewrite the code thus:
+
+<p><pre>
+void
+fcpy(float *a, float *b, float *aa, float *bb, int n)
+{
+ int i;
+ for(i = n; i > 0; i--) {
+ *aa++ = *a++;
+ *bb++ = *b++;
+ }
+}
+</pre>
+
+<p>which compiles to this inner loop:
+
+<p><pre>
+.L6:
+ movl (%edi), %eax
+ addl $4, %edi
+ movl %eax, (%ebx)
+ addl $4, %ebx
+ movl (%esi), %eax
+ addl $4, %esi
+ movl %eax, (%ecx)
+ addl $4, %ecx
+ addl $-1, %edx
+ jg .L6
+</pre>
+
+<p>Yes, more adds are necessary, but this loop is going to be bound by
+I/O bandwidth anyway, and the rewrite gets rid of the limit register.
+Thus the loop fits in the integer registers again. Note that I have
+no idea why it isn't using the <code>'decl'</code> instruction.
+
+<p>If this were Fortran, we could do even better:
+
+<p><pre>
+void
+fcpy(float *a, float *b, float *aa, float *bb, int n)
+{
+ int i;
+ for(i = n; i > 0; i--) {
+ aa[i] = a[i];
+ bb[i] = b[i];
+ }
+}
+</pre>
+
+<p>which compiles to:
+
+<p><pre>
+.L6:
+ movl (%ebp,%ecx,4), %eax
+ movl (%edi,%ecx,4), %edx
+ movl %eax, (%esi,%ecx,4)
+ movl %edx, (%ebx,%ecx,4)
+ addl $-1, %ecx
+ jg .L6
+</pre>
+
+<p>at least with <code>-fomit-frame-pointer</code>. You can't make
+that transformation in C because the compiler isn't allowed to assume
+that the vectors pointed to by <code>a</code>, <code>b</code>,
+<code>aa</code>, and <code>bb</code> do not overlap. In Fortran it
+is.
+
+<p>Then there's the question of loop unrolling, loop splitting, etc.
+but high-level transformations like those are outside the scope of
+this document.
+
+<hr>
+<p>Last modified: 22 Jan 2000
+<p>Zack Weinberg, <a
+href="mailto:zack@wolery.cumb.org"><zack@wolery.cumb.org></a>
+
+</body>
+</html>
===================================================================
Index: proj-cpplib.html
--- proj-cpplib.html 1999/09/28 20:55:30 1.5
+++ proj-cpplib.html 2000/01/29 07:45:11
@@ -1,217 +1,176 @@
-<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+ "http://www.w3.org/TR/html40/loose.dtd">
<html><head>
<title>cpplib TODO</title>
+<link rev="made" href="mailto:zack@wolery.cumb.org">
</head>
<body bgcolor="white" text="black" link="#0000EE" vlink="#551A8B" alink="red">
-<h1 align=center>Projects relating to cpplib</h1>
+<h1 align="center">Projects relating to cpplib</h1>
-cpplib is almost ready to replace cccp as the standalone C
-preprocessor used by gcc. A bit more work is necessary before it can
-be used directly from cc1 and the other front ends.
+<p>As of 28 January 2000, cpplib is the default C preprocessor used by
+gcc. It is not yet linked into the C and C++ front ends by default,
+because the interface is likely to change and there are still some
+major bugs in that area. There remain a number of bugs which need to
+be stomped out, and some missing features. We also badly need more
+testing.
+
+<h2>How to help test</h2>
+
+<p>The number one priority for testing is cross-platform work. Simply
+bootstrap the compiler and run the test suite on as many different OS
+and hardware combinations as you can. I only have access to a very
+few.
+
+<p>The number two priority is large packages that (ab)use the
+preprocessor heavily. The compiler itself is pretty good at that, but
+doesn't cover all the bases. If you've got cycles to burn, please
+try one or more of:
+
+<ul>
+ <li>BSD 'make world'
+ <li>Binutils
+ <li>Emacs
+ <li>GNOME
+ <li>GNU libc
+ <li>Guile
+ <li>Linux kernel (esp. non-i386)
+ <li>Mozilla
+ <li>Obfuscated C Contest entries
+ <li>Perl
+ <li>X11
+ <li>... and anything else you can think of.
+</ul>
+
+<p>Old grotty pre-ANSI code is particularly good for exposing bad
+assumptions and missed corner cases; you may have more trouble with
+bugs in the package than bugs in the compiler, though.
+
+<p>A bug report saying 'package FOO won't compile on system BAR' is
+useless. At this stage what I need are short testcases with no system
+dependencies. Aim for less than fifty lines and no #includes at all.
+I recognize this won't always be possible.
+
+<p>Also, please file off everything that would cause us legal trouble
+if we were to roll your test case into the distributed test suite.
+Short test cases will almost always fall under fair use guidelines, so
+don't sweat it too much. An example of a problem is if your test case
+includes a 200-line comment detailing inner workings of your program.
+(A 200-line comment might be what you need to provoke a bug, but its
+contents are unlikely to matter. Try running it through
+<code>"tr A-Za-z x"</code>.)
+
+<p>As usual, report bugs to <a
+href="mailto:gcc-bugs@gcc.gnu.org">gcc-bugs@gcc.gnu.org</a>. But
+please read the rest of this document first!
+
+<h2>Known Bugs</h2>
+
+<p><ol>
+ <li>Under some conditions the line numbers seen by the compiler
+ proper are incorrect. It shows up most obviously as bad line
+ numbers in warnings when bootstrapping the compiler. I have not
+ been able to reproduce this with an input file of less than a
+ couple thousand lines. Help would be greatly appreciated.
+
+ <li>cpplib will silently mangle input files containing ASCII NUL.
+ The cause of the bug is well known, but we weren't able to come
+ to consensus on what to do about it. My personal preference is
+ to issue a warning and strip the NUL; other people feel it
+ should be preserved or considered a hard error.
+
+ <li>Character sets that are <em>not</em> strict supersets of ASCII
+ may cause cpplib to mangle the input file, even in comments or
+ strings. Unfortunately, that includes important character sets
+ such as Shift JIS and UCS2. (Please see the discussion of <a
+ href="#charset">character set issues</a>, below.)
+
+ <li>Trigraphs provoke warnings everywhere in the input file, even in
+ comments. This is obnoxious, but difficult to fix due to the
+ brain-dead semantics of trigraphs and backslash-newline.
+
+ <li>Code that does perverse things with directives inside macro
+ arguments can cause the preprocessor to dump core. cccp dealt
+ with this by disallowing all directives there, but it might be
+ nice to permit conditionals at least.
-<p>In rough priority order, the things that need to be done before cccp
-is retired:
+</ol>
-<ol>
- <li>Fix the handling of <code>#define</code> and <code>#if</code>
- so that they use the same lexical analysis code as the rest of
- cpplib (i.e. <code>cpp_get_token</code>). This is essential to
- adding support for the new preprocessor features in C9x and C89
- Amendment 1. It also will enable the removal of the last global
- variable in cpplib - meaning the library will be reentrant as
- long as different <code>cpp_reader</code> objects are in use.
- (For the curious: it's the presence or absence of <code>$</code>
- in <code>is_idchar</code> and <code>is_idstart</code>.)
-
- <li>Implement C89 Amendment 1 "alternate spellings" of punctuators:<br>
+<h2>Missing User-visible Features</h2>
+
+<p><ol>
+
+ <li>C89 Amendment 1 "alternate spellings" of punctuators are not
+ recognized. These are
<pre> <: :> <% %> %: %:%:</pre>
which correspond, respectively, to
<pre> [ ] { } # ##</pre>
The preprocessor must be aware of all of them, even though it
uses only <code>%:</code> and <code>%:%:</code> itself.
-
- <li>Support multi-byte characters in comments, identifiers, string
- constants, and character constants. Consensus on the egcs
- development list was that this can be limited to systems with
- support for reentrant multi-byte functions and for the
- <code>nl_langinfo</code> interface. cpplib will make no attempt
- to interpret or translate multibyte characters.
-
- <p>cpplib contains some optimizations which may not be
- valid in the presence of multibyte characters. The code to read
- files and perform translation phases 1 through 3
- (<code>read_and_prescan</code> in cppfiles.c) may
- break if the bytes corresponding to <code>\</code>, <code>?</code>,
- <code>^M</code>, and <code>^J</code> in ASCII can appear
- "inside" a multibyte character. Shift JIS has some characters
- like this, but it is not clear to me whether the specific case
- that will trigger problems can occur.
-
- <p>A question for character set experts: Are there multibyte
- encodings for which the length of a multibyte sequence cannot be
- determined by examining only the first character of that
- sequence? If so, which ones are they?
-
- <li>Ignore ASCII NUL in an input file, with a warning. Right now it
- silently mangles the output.
-
- <p>This is easy to fix; in <code>read_and_prescan</code>, don't
- assume that NUL ends the current chunk of input. But you have
- to not wreck performance while you're at it.
- <code>read_and_prescan</code> is fragile.
-
- <li>Fix the memory leak in <code>#undef</code>. Someone thought it
- was necessary to support
-<p>
-<pre> #define foo(arg) blah arg blah
- foo(bar
- #undef foo
- baz)
-</pre>
-<p>
- which is undefined (C9x 6.10.3.11). The Right Thing is to
- detect this in <code>macarg</code> and treat it as an error.
-
- <li>Support the <code>-lint</code> switch at least as well as cccp
- does. This is easy once someone tells me what the exact
- syntax of lint comments are. I believe that the regexp
- <code>/^\s*\/\*\s*[A-Z0-9]+\s*\*\/\s*$/</code> correctly
- describes the syntax, but would like confirmation from someone
- who has actually used lint. In particular, is it true that lint
- comments always appear on lines by themselves?
-
- <li>Support cccp's <code>-Wwhite-space</code> feature; this warns
- about <code>/\\\s+$/</code>, which is not a line-continuation
- backslash, but looks like one. (Is there anything else it
- should warn about?)
-
- <li>More testing. I would like, at least, reports that bootstrap
- completes and the testsuite gets no regressions versus cccp on
- most major platforms. Other tests that would be useful: build
- X11; test Imake outside the X11 tree; build the complete
- (free|net|open)BSD tree; compile Emacs (both FSF and X). I
- already test glibc compiles on a regular basis.
- <p>Test results for non-Intel and/or non-Linux platforms are
- particularly desirable.
-</ol>
-
-To make cpplib usable from within language front ends, we need:
+ <li>Character sets that are strict supersets of ASCII are safe to
+ use, but extended characters cannot appear in identifiers. This
+ has to be coordinated with the front end, and requires library
+ support which is usually not adequate. See <a
+ href="#charset">character set issues</a>, below.
+
+ <li>C99 universal character escapes (<code>\uxxxx</code>,
+ <code>\Uxxxxxxxx</code>) are not recognized. They are harmless
+ in comments, and will be passed on to the compiler safely if
+ they appear elsewhere, but cannot be used in macro names or #if
+ directives. The C front end doesn't handle them either.
+
+ <li>C99's <code>_Pragma</code> intrinsic is not supported. This
+ needs to be done in conjunction with the front end.
+
+ <li>cccp had some marginal support for translating lint directives
+ into #pragmas which the front end could see. Of course, the
+ front end never did anything with them. I don't intend to put
+ this back till the front end can use them.
+
+ <li>Precompiled headers are commonly requested; this entails the
+ ability for cpp to dump out and reload all its internal state.
+ You can get some of this with the debug switches, but not all,
+ and not in a reloadable format. The front end must cooperate
+ also.
+
+ <li>Someone once requested warnings about stray whitespace in the
+ input, notably trailing whitespace after a backslash. If that
+ happens, you have something that looks like a line-continuation
+ backslash, but isn't.
+
+ <li>Better support for languages other than C would be nice. People
+ want to preprocess Fortran, Chill, and assembly language. Chill
+ has been kludged in, Fortran and assembly still have serious
+ issues (notably, comment and string detection).
-<ol start=9>
- <li>The public interface and private implementation details of
- cpplib are currently mixed together in cpplib.h.
- This must be cleaned up.
-
- <li>When cc1 is invoked on an already-preprocessed (.i) file,
- the preprocessor must not be run again. This should work, I'm
- not sure.
+ <li><code>#define TOKEN TOKEN</code> should not cause infinite
+ recursion on the buffer stack when <code>-traditional</code> is
+ on. All the interesting uses of traditional macro recursion use
+ function-like macros; object macros should probably be ANSI-ish
+ all the time.
- <li>When cpplib is linked into front ends <code>-save-temps</code>
- does not preserve an .i file. This is the temp
- file you usually want when tracking compiler bugs; its loss is
- intolerable. The simple fix: in the gcc driver, when
- <code>-save-temps</code> is given, revert to using the external
- preprocessor.
</ol>
-Once that is done, more optimizations are possible:
+<h2>Internal work that needs doing</h2>
-<ol start=12>
- <li>cc1 and cpp do quite a bit of duplicate bookkeeping of source
- file, line, etc. This should be eliminated.
-
- <li>cc1 should take advantage of the partial lexical analysis done
- by cpplib. Maybe cpplib should do more complete lexical
- analysis of C - at least identify all the different punctuators.
-
- <p>To do that cleanly, <code>cpp_get_token</code> must return
- exactly one token per invocation, except at EOF. We need a
- mechanism to queue a list of tokens for output. This should fall
- out of the macro-expansion rewrite.
-
- <p>The main problem here is the directives like
- <code>#pragma</code> and <code>#ident</code> that are passed on
- to the language front-end for interpretation. It would be a
- good idea to add extended syntax to the front end that will fit
- into the grammar (instead of requiring a special hook, as is
- currently done) and translate.
-</ol>
-
-Some longer term projects which are largely independent of using
-cpplib directly from language front ends:
+<ol>
+ <li>The handling of <code>#define</code> and <code>#if</code> must
+ be fixed so it uses the same lexical analysis code as the rest of
+ cpplib (i.e. <code>cpp_get_token</code>). This is essential to
+ adding support for the new preprocessor features in C9x and C89
+ Amendment 1.
-<ol start=15>
- <li>Implement C9x UCN escapes. These look like <code>\uXXXX</code>
- or <code>\UXXXXXXXX</code> where each X is a hexadecimal digit.
- They are legal in identifiers, string constants, and character
- constants, and must be validated against some constraints when
- parsed. They designate characters from ISO/IEC 10646 (aka
- Unicode) which are not in the "source character set". cpplib
- will not interpret them beyond the constraints in C9x, except
- that it will map <code>\u0024</code> to <code>$</code>,
- <code>\u0040</code> to <code>@</code>, and <code>\u0060</code>
- to <code>`</code>. All other UCN escapes with numbers below
- <code>00A0</code> are illegal. (Yes, this does mean that
- <code>\u0024</code> will be legal in identifiers if and only if
- the <code>$</code>-in-identifier extension is enabled.)
-
- <li>It Would Be Nice if cpplib recognized when a multibyte character
- was equivalent to a UCN escape; e.g. the sequences
- <code>Gómez</code> and <code>G\u00F3mez</code> should be
- treated as the same identifer. This unfortunately would require
- converting arbitrary multibyte characters to Unicode, and there
- is no portable way to do that (<code>mbtowc</code> does not
- necessarily produce Unicode). However, cc1 has to do it, so
- whatever solution we adopt there can be used in cpp also.
+ <li>cpplib makes two separate passes over the input file, which
+ causes a number of headaches, such as the trigraph warnings
+ inside comments. It's also a performance problem. Semantic
+ issues make a one-pass lexer impractical, but a two pass scheme
+ with the first pass called coroutine fashion from the first
+ should work better.
<li>The macro expander could use a total rewrite. We currently
re-tokenize macros every time they are expanded. It'd be better
to tokenize when the macro is defined and remember it for later.
- Also, the macro expander is recursive and allocates large arrays
- on the stack, which is asking for trouble.
-
- <li>It might be worthwhile to cache file buffers after processing by
- <code>read_and_prescan</code>. My limited survey of header files
- indicates that headers which don't contain idempotence
- <code>#ifdef</code>s are generally included multiple times
- (examples: stddef.h, tree.def).
- Caching would avoid the expense of rereading from the disk (or OS
- cache) and the expense of redoing translation phases 1-3. I
- spent a lot of time bumming cycles out of
- <code>read_and_prescan</code>, but it's still an expensive
- operation. However, the memory cost may be prohibitive.
-
- <li>Wrapper headers - files containing only an include of another
- file - should be optimized out on reinclusion.
-
- <li><code>#define TOKEN TOKEN</code> should not cause infinite
- recursion on the buffer stack when <code>-traditional</code> is
- on. GNU libc uses this construct heavily; it is therefore
- impossible to use <code>-traditional</code> on systems that use
- it. Actually, all the interesting uses of traditional-mode
- macro recursion involve macros with arguments, so maybe
- object-like macros should always behave as specified in C89.
- <p>The specific case where an object-like macro is defined to
- itself can be optimized: give them their own hashtable code,
- don't bother allocating a <code>DEFINITION</code> structure, and
- skip all the processing done by <code>macroexpand</code>.
-
- <li>Support for C9x's <code>_Pragma("...")</code> built-in macro
- needs to be added eventually. Ideally <code>#pragma</code> and
- <code>_Pragma()</code> would go through the same interface, but
- this may be difficult.
- <p>An idea for implementation: invent a destringizing operator
- symmetric with the existing stringizer. Then _Pragma could be
- implemented by the equivalent of
-<pre> #define _Pragma(arg) #pragma #$arg</pre>
- where <code>#$</code> is the destringizer. This has almost the
- right semantics for _Pragma according to C9x. (The resultant
- line is supposed to be processed as a directive, which wouldn't
- happen if you took the above literally.) Problem: a strictly
- conforming program could contain <code>#$</code> in a context
- where it would be interpreted as the destringizing operator.
<li>The code uses <code>long</code>, <code>unsigned long</code>, and
<code>size_t</code> interchangeably. This is wrong, and needs to
@@ -223,28 +182,192 @@ cpplib directly from language front ends
*</code>, and <code>U_CHAR *</code> interchangeably. This is
more of a consistency issue and annoyance than a real problem.
+ <li>VMS support has suffered extreme bit rot. There may be problems
+ with support for DOS, Windows, MVS, and other non-Unixy
+ platforms. I can fix none of these myself.
+
<li>We use too much stack. Large arrays should be moved to static
storage (if constant) or the heap (if not).
+
+</ol>
+
+<h2>Integrating cpplib with the front ends</h2>
+
+<ol>
+
+ <li>The lexer should do more work - enough that when cpplib is
+ linked into the C or C++ front end, the front end doesn't have
+ to do any rescanning of tokens.
+
+ <li>The library interface needs to be tidied up. Internal
+ implementation details are exposed all over the place.
+ Extracting all the information the library provides is
+ difficult.
+
+ <li><code>cpp_get_token</code> must be changed to return exactly one
+ token per invocation. For performance, there should be a
+ <code>cpp_get_tokens</code> call that returns a lineful.
+
+ <li>Front ends need to use cpplib's line and column numbering
+ interface directly. cpplib needs to stop inserting #line
+ directives into the output. (The standalone preprocessor in
+ cppmain.c counts as a front end.)
+
+ <li>When cpplib is linked into front ends <code>-save-temps</code>
+ does not preserve an .i file. This is the temp
+ file you usually want when tracking compiler bugs; its loss is
+ intolerable. The simple fix: in the gcc driver, when
+ <code>-save-temps</code> is given, revert to using the external
+ preprocessor.
- <li>VMS support has bit-rotted to the point of total brokenness.
- Someone who knows VMS needs to look at this. EBCDIC support
- (i.e. the MVS port) <i>may</i> be functional, but I wouldn't
- swear to it. The MVS port may also need system-specific code.
-
- <li>More generally, there is quite a bit of Unix-specific code in
- cppfiles.c. It might be a good idea to reduce this. Use of
- stdio instead of POSIX I/O primitives is an obvious change.
- (This might also make line-ending and multibyte character
- support easier.) Other things, like include search paths, are
- harder.
</ol>
+
+<h2>Optimizations</h2>
+
+<ol>
+
+ <li>It might be worthwhile to cache file buffers in memory after
+ lexical analysis, but before directive processing and macro
+ expansion. My limited survey of header files indicates
+ that headers which don't contain wrapper <code>#ifdef</code>s
+ are generally included multiple times (examples: stddef.h,
+ tree.def). Caching would avoid a good deal of work. However,
+ the memory cost may be prohibitive.
+
+ <li>A complement to the usual one-huge-file scheme of precompiled
+ headers would be to cache files on disk after lexical analysis.
+ You could run a cruncher over <code>/usr/include</code> and save
+ the results in a <code>.jar</code> file or similar, bypassing
+ filesystem overhead as well as the work of lexical analysis.
+
+ <li>Wrapper headers - files containing only an #include of another
+ file - should be optimized out on reinclusion. (Just tweak the
+ hash table entry of the wrapper to point to the file it reads.)
+
+ <li>When a macro is defined to itself, bypass the macro expander
+ entirely.
+
+ <li>Consider reading files with <code>mmap</code> rather than
+ <code>read</code>. (Portability issues; may not be a real win.)
-<p>
+</ol>
+
+<h2><a name="charset">Character set issues</a></h2>
+
+<p>Proper character set handling is a hard problem. Users want to be
+able to write comments and strings in their native language. They
+want the strings to come out in their native language and not
+gibberish after translation to object code. Some users also want to
+use their own alphabet for identifiers in their code. There is no
+one-to-one or many-to-one map between languages and character sets.
+The subset of ASCII that is included in most modern day character sets
+does not include all the punctuation C uses; some of the missing
+punctuation may be present but at a different place than where it is
+in ASCII. The subset described in ISO646 may not be the smallest
+subset out there.
+
+<p>Furthermore, the C standard's solutions for these problems are all
+more or less hideous. None rises above the status of kludge.
+Trigraphs are nonintuitive and cause far more problems than they
+solve. Digraphs are okay, but but nonintuitive and not a complete
+solution. <code>iso646.h</code> merely shifts the problem from one
+place to another, and is not a complete solution either. UCN escapes
+assume Unicode, which makes them unsuitable for most Japanese and some
+Chinese environments.
+
+<p>Compounding the problem, the standard C library features for
+processing non-ASCII character sets are sadly lacking, even in the new
+standard (which no one's finished implementing yet). To explain why,
+some background is necessary. You can divide all existing character
+sets into three classes: unibyte, multibyte, and wide. Multibyte
+characters can be further subdivided into shifted and unshifted
+encodings. ASCII and most of its strict supersets - ISO 8859-x,
+KOI8-R, etc - are unibyte, which means that all characters are exactly
+one byte long. This is obviously the easiest to deal with.
+
+<p>UCS2 and UCS4, and no other sets that I know of, are wide; this
+means that all characters are N bytes long, for some N greater than
+one. Handling these requires mechanical code changes throughout the
+lexer, which is then incapable of handling unibyte encodings; you have
+to add a translator. Memory requirements obviously at least double.
+However, no structural changes are needed.
+
+<p>UTF-8 and a few others are unshifted multibyte encodings. That
+means that not all characters are one byte long, but given any one
+byte you can tell if it's a one-byte character, the first byte of a
+longer character, or one of the trailing bytes of a longer character,
+without any additional information. These are almost as easy to deal
+with as unibyte encodings.
+
+<p>Finally, JISx and a few others are shifted multibyte encodings,
+meaning that you must remember state as you walk down a string in
+order to interpret it. These are the worst to handle. Unfortunately,
+this category includes most of the character sets used in Asian
+countries.
+
+<p>The C standard library has no way of processing multibyte
+encodings, shifted or not, other than translating them into some
+unspecified wide encoding. For unshifted multibyte encodings, you can
+fake it as long as the only characters you're interested in
+manipulating (as opposed to passing through unexamined) are in the
+unibyte subset. That's true for UTF8 and C as long as you only allow
+the usual English letters, Arabic numbers, and the underscore in
+identifiers. If you want to permit other alphanumeric characters in
+identifiers, you've got to find out what they are, and that requires
+converting to wide encoding first.
+
+<p>So what's wrong with converting to wide encoding? First, it's
+slow. Obscenely slow, with most C libraries. It may be acceptably
+fast to convert an entire file all at once, but that doubles or
+quadruples your memory consumption. Typical C source files are on a
+par with data cache sizes as is; double it and you're in main memory
+and slowed to a crawl.
+
+<p>Second, the normal wide encoding is Unicode, and conversion from
+some sets (JISx, again) to Unicode and back loses information. [This
+is the infamous "Han unification" problem.]
+
+<p>Third, there is no portable way to tell the library what multibyte
+encoding you want to convert from. You can only specify it indirectly
+by way of the locale. Locale strings are not standardized, and
+setting the locale changes other behavior that we want left alone.
+
+<p>It is possible to walk down a multibyte string without converting
+it, using <code>mbrlen</code> or equivalent. That's the slowest
+possible mode you can put the conversion library in, though. Nor does
+it tell you anything about the characters you're hopping over.
+
+<p>End of rant. So what's cpplib likely to support in the near
+future? We will verify that it is safe to use any charset that is a
+strict superset of ASCII (unibyte or unshifted multibyte) in strings,
+character constants, and comments. We'll also support UCN escapes in
+those locations. If you write them in strings, the result will be
+in UTF-8.
+
+<p>Support for shifted multibyte charsets in will come next, and will
+involve some sort of library that provides all of the useful
+<code>string.h</code> and <code>ctype.h</code> functions for an
+arbitrary character set, <em>without</em> conversion. This will also
+require us to have some way to specify what character set an input
+file uses; the scheme MULE (Multilingual Emacs) uses is one
+possibility, and a #pragma is another.
+
+<p>Support for additional alphanumeric characters in identifiers will
+be added much later, because it presents ABI issues as well as
+compiler-guts issues. Arbitrary bytes usually aren't legal in
+assembly labels nor in object-file string tables, so there needs to be
+a mangling scheme. That scheme might be charset dependent,
+independent, or neutral, and you can make a case for all three. All
+these things must be debated before we can implement anything.
+
+<p>There's one exception - <code>\u0024</code> will be legal in
+identifiers if and only if <code>$</code> is also legal.
+
+<p><hr>
<address>Zack Weinberg,
-<a href="mailto:zack@rabi.columbia.edu">zack@rabi.columbia.edu</a>
+<a href="mailto:zack@wolery.cumb.org">zack@wolery.cumb.org</a>
</address>
-<br><small><i>Last modified on May 15, 1999.</i></small>
-
+<br>Last modified 28 Jan 2000.
<hr>
<p><a href="projects.html">Back to the projects page</a>