This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add __attribute__((format(wprintf, 2, 3))) etc. formatchecking support


On Fri, 14 Dec 2001, Jakub Jelinek wrote:

> > Before such wide string format support can go in, I want the following
> > properly resolved (and failed to get them resolved on the lists when I
> > tried before):
> 
> I don't see why this all needs to be a precondition of wide format checking.
> From what I saw in c-format.c, it needs very limited subsect of such
> knowledge.

It needs to know how to extract both chars and multibyte characters from
narrow strings.  It needs to know how to extract wide characters from wide
strings.  It needs (when we get multibyte support) to be able to check
whether a narrow string ends in the initial shift state.  With the current
message text it needs to print the values of unknown conversion type
characters (which unfortunately using HOST_WIDE_INT and so
HOST_WIDE_INT_PRINT_* will complicate).

We should do things the right way, not add another kludge that only works
for 99.99% of GCC users and is thoroughly broken on some of the targets
GCC supports.  Where the underlying meaning of the structures involved is
unclear, we should define it properly before adding another lot of
presumptions about the format of those structures.  A proper definition of
how target strings, narrow and wide, are represented and handled, is part
of my conception of how wide format checking should work - in the format
checking code itself there will end up being simple macros or functions to
extract a narrow/wide character as appropriate and advance a string (the
interface you had to these was probably fine), but underlying that a
proper infrastructure is needed.  When I attempted to implement wide
format checking some months ago this fell over on the lack of clear
definitions here before anything relating to formats was done.

It is quite possible for a conceptually simple change to need much larger
infrastructure changes behind it for it to be done properly in a clean
manner with maximal subsequent maintainability of the compiler.  In such a
case the infrastructure should be got in first, even if not of much
intrinsic interest and it delays features relative to a less clean
implementation.  (For example, to get format attributes working on types I
did 10000 lines of generic attribute handling infrastructure patches
first.)

> Current STRING_CST format doesn't allow all possible values stored if target
> char is wider than host char, otherwise all should work well, but is
> supporting all possible values in string literals really neccessary?
> If yes, then IMHO the best way would be just to represent them as
> CONSTRUCTOR (as soon as we encounter the first character unrepresentable in
> hosts character).

Of course it's necessary to represent all values - it's quite clear from
the C standard that a hexadecimal escape sequence can be used to represent
any such value.  What things look like internally in the STRING_CST
doesn't matter so much as long as the compiler is all agreed on it - but
if any tree code other than STRING_CST is used, more things may break.

> > * What is the format of wide STRING_CSTs in such cases?
> 
> Each host character represents target's BITS_PER_UNIT bits of the wide
> character, their order depends on BYTES_BIG_ENDIAN.

A *sequence* of host character can represent a target BITS_PER_UNIT bits -
this is a reasonable way of only needing a single rule for how target
characters are serialised - but some rule needs to be declared for how the
bits within a target unit are stored on the host.

> There should be common functions, depending on whether we support wchar_t
> wider than HOST_WIDE_INT they should either return a tree or HOST_WIDE_INT.
> 
> If the latter, format checking can use it, but if the former, IMHO using
> such functions for format checking would be huge overkill when c-format is
> only interested in values L'\0' and L' ' - L'~' range at most.

My view is that we should require that char and wchar_t are no wider than
HOST_WIDE_INT and fail to compile GCC if MAX_CHAR_TYPE_SIZE or
MAX_WCHAR_TYPE_SIZE are too big.  I don't know however whether this has
consensus.  The functions would then return HOST_WIDE_INT (or unsigned
HOST_WIDE_INT).

-- 
Joseph S. Myers
jsm28@cam.ac.uk


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]