Universal Character Names, v2
Neil Booth
neil@daikokuya.co.uk
Fri Nov 29 13:35:00 GMT 2002
Martin v. L?wis wrote:-
> + if (utf8)
> + {
> + result->flags |= NODE_USES_EXTENDED_CHARACTERS;
> +#ifndef HAVE_AS_UTF8
> + cpp_error (pfile, DL_ERROR,
> + "Non-ASCII identifiers not supported by your assembler");
> +#endif
> + }
This doesn't belong here. Someone doing preprocessing only would be
not too happy at this message.
I suggest this should only be a warning (it could be -S with the
output used on a different assembler, or for some other purpose),
only be emitted once per translation unit, and be moved to c_lex().
> + {
> + const unsigned char *s = NODE_NAME (token->val.node);
> + int len = NODE_LEN (token->val.node);
> + while (len)
> + {
> + if (*s < 128)
> + {
> + *buffer++ = *s++;
> + len--;
> + }
> + else
> + {
> + const unsigned char *old = s;
> + cppchar_t code = utf8_to_char (&s);
> + if (code < 0x10000)
> + buffer += sprintf ((char*)buffer, "\\u%.4x", code);
> + else
> + buffer += sprintf ((char*)buffer, "\\U%.8x", code);
> + len -= s - old;
> + }
> + }
> + }
This should be in a subroutine to avoid code duplication. (I know this
isn't true of this code in general, but we're not in the fast path
when doing UCS's. One day I hope to have solved the performance issue,
and then there will only be a single copy of the lot).
> +
> +static int
> +maybe_read_ucs_reader (pfile, pc)
> + cpp_reader *pfile;
> + cppchar_t *pc;
Can I suggest that, instead of doing this, you have a routine that
reads a UCS's digits (4 or 8) into a uchar[8] buffer, and that you
re-use maybe_read_ucs() on this buffer? maybe_read_ucs() might
need a few small tweaks. Again, this would avoid duplication.
Thanks,
Neil.
More information about the Java
mailing list