This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Two suggestions for gcc C compiler to extend C language (by WD Smith)


On 26/07/16 16:37, Warren D Smith wrote:

You would get on far better here if you tried a little politeness and
respect, rather than anger, accusations and confrontation.

The C standards were written by a group of very smart and experienced
people, refined over a long time based on real-world issues and
feedback.  The language has been successful for decades.  And the gcc
implementation of the language is developed by another group of very
smart and experienced people, and has also been refined over decades.

So if you say one thing, and the C standards and/or the gcc developers
say something else, the chances are extremely high that you are wrong
and they are right.  If the gcc compiler is missing a feature that you
would like, it is not because someone is lazy, incompetent, or trying to
be difficult - more often, it is simply that the feature you want is not
actually a good idea, or it is not worth the effort to implement for the
few cases where it would be used, or it is already possible using a
different method.

> On 7/26/16, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>> On 26 July 2016 at 14:31, Warren D Smith wrote:
>>> 1. Gcc with stdint.h already
>>> provides such nice predefined types as uint8_t.
>>> Sizes provided are 8,16,32, and 64.
>>> In some sense uint1_t is available too (stdbool.h)
>>> but at least on my machine stdbool uses 8-bits to store a bool,
>>
>> Because that's the smallest addressable unit.
> 
> --look, I know there is a pascal front end for gcc, so I know gcc could provide
> packed bool arrays, because whoever wrote pascal-gcc already did provide it.

You have made it clear that you don't actually understand C in detail.
Apparently you also don't understand Pascal, or the underlying hardware
of typical processors.

Yes, Pascal may provide packed bool arrays.  But you cannot take the
address of the individual elements in a packed bool array.  And Pascal
arrays are not the same as C arrays - they contain more information, and
are more akin to C++ container types.  A single "bool" object in Pascal
takes one byte (a single smallest addressable unit, typically 8 bits).

Pascal implementations can provide packed bool arrays as a convenience
to the programmer - it is a higher level language than C, and provides a
number of such conveniences.  But the underlying mechanism for accessing
elements of the array involves masks and shifts - it is very different
from the simple access for an unpacked array.

In C, you write your own functions for this sort of thing (or use a
library).  In C++, standard containers such as std::bitset and
std::vector have optimisations to hide the details.  So if you want a
programming language that lets you have something that is a bit like an
array, but with packed boolean elements, and is used transparently like
a normal array - then pick a different language than C.

> 
> Also, I know on some machines to access a byte you have to get a word
> (larger than 8 bits)
> from memory, do shifts and masks.  So clearly you already do that inside gcc.

No, on systems that have a basic unit of memory that is bigger than 8
bits, your "byte" is bigger than 8 bits and it is the smallest unit you
can access.  So a DSP devices with 16-bit minimum addressable units has
16-bit "bytes", 16-bit char, CHAR_BIT == 16, sizeof(uint32_t) == 2, and
no way to directly access 8-bit units.  You have to use bitfields that
are 8-bit wide in a struct, or do the masking and shifting yourself.

(As far as I know, gcc does not support any targets that have CHAR_BIT
other than 8, but the principle is the same as this is basic standard C
stuff.)

> It therefore is trivial for you to do uint4_t also, because it would
> be that exact same code you already have, just change some numbers.

No.

> 
> The reason C language originally went with 8,16,32 only,
> was an embarrassing historical accident because the first C compiler
> was developed on some obsolete DEC PDP machine from hell.

8-bit basic units were already becoming mainstream when C was developed,
and were a popular choice.  The DEC machine in question was the PDP-11,
and it was not obsolete when C was developed, and it was certainly not a
"machine from hell".  It was one of the most successful and popular
minicomputers ever made.  (The rumour is that some of the terseness in
the C syntax is due to the DEC keyboards being terrible - whatever truth
there may be in that, the cpu itself was lovely.)

> DEC no longer exists, and PDP machines also no longer exist.
> To make things worse, they originally defined C *WITHOUT*
> actual unambiguous meanings, for example "int" might be who knows how
> many bits wide, language would not define that.  

This was an active decision to allow greater flexibility of the
language, and to make it easier to have implementations of C on systems
that could not support sizes such as 8 bit.  It means there are C
implementations for almost all cpus.  It does mean that portable code
does not know the exact size of the types on it, though there are
certain guarantees about ranges - that is a trade-off.  C99's <stdint.h>
types goes a fair way towards giving fixed sizes for when those are needed.

> This horrible nonportability
> again was done because the original developers of C foolishly acted like that
> one PDP machine was going to be the only machine ever to handle C,
> in which case their decisions made perfect sense.

No, you have that totally and completely wrong.  The decision was made
precisely because the developers of C wanted the language to be usable
on as wide a range of hardware as possible.

> 
> However, it is now the year 2016.    Many other machines with many other
> word sizes have been developed.  It is simply flat out wrong to say
> that a byte is the "smallest addressable unit" if we are on a machine
> with a different word size, for example I know 60-bit and 36-bit wide
> machines were sold at one point, note not divisible by 8.

Again, you have this almost totally backwards.  There was a time,
decades ago, when there were a good many architectures with very
different byte sizes - such as 36-bit mainframes.  This is why C does
not specify the exact sizes of its types.  With the exception of DSP
processors and a few even more specialised devices, all modern
processors have 8-bit bytes.

> 
> It is correct for some machines, false for other machines.
> 
> The point of a high level language like C, as opposed to assembler, it
> to allow portable code to be safely written without the programmer
> having to worry about the specific
> peculiarities of just one machine.

C is a very low level language.  And yes, its types allow portable code
to be written safely across a range of different targets, including a
range of different byte sizes.  Sometimes this is best done using the
target-dependent sized types like "char" and "int", sometimes it is best
done using the fixed size types like "uint16_t" - as a C programmer, you
can use whichever works best for the task at hand.

> 
> You, by saying "because 8 bits is the smallest addressable unit" have just
> said "I am not interested in that goal, I am only interested in my
> specific one machine."

8 bits is the smallest addressable unit that any C implementation can
support - any support for something smaller is a non-standard extension
to C (and there are a few processors that are able to address bits
directly, and C compilers for those processors often support such
access).  Almost all processors any given programmer will come across
will have 8-bit bytes.  /All/ processors that are programmed using C
will have bytes that are at least 8 bits.  (Four bit cpus are usually
programmed in a sort of Forth-like language.)  And I believe that all
mainline gcc targets have CHAR_BIT == 8, though there have been gcc
ports made for other sizes (such as the 9-bit PDP-10).

> 
> Mistake.
> 
> And that mistake was exactly the error made when C was originally created.
> Later it was recognized by consensus that they had indeed made a
> design error, and stdint.h was brought to us to correct that error.

It was found that programmers had need of types with fixed, known sizes
- so the <stdint.h> types were added.  This is known as "evolution" of
the language, and is the reason there has been more than one C standard
published.

> But it failed to fully correct the error
> because, at least with gcc's implementation of stdint.h, only 8,16,32,
> and 64 are provided.

These cover the needs of virtually everyone in virtually all cases.

Note that the rules of C in the standards impose certain requirements on
extended integer types which make it a very significant task to support
full integer types that are not typedefs of the standard types (char,
short, int, long, long long).

> The standard however allows other sizes also to be provided, like uint2_t.
> It is just the gcc did not implement other sizes.  Nothing is stopping you.

Yes, there is plenty stopping them.  Support would be an enormous effort
across a range of projects (including libraries, assemblers, and linkers
as well as just the compiler), and usage would be practically
non-existent.  And because of the rules of C, it would not do what you
want it to do.

On the other hand, making a C++ class for a 2-bit integer with
specialisations of std::vector or std::array for packing would be easy
to do, conform with all the standards, libraries and tools, and have all
the convenience you want.  So anyone who wants 2-bit integers that look
like 8-bit integers in their code, and doesn't want the "ugliness" of
access functions or macros, can pick a language that suits their needs
better.

> 
> A different poster pointed out gcc has implemented 128, and that is fine, but:
> (a) not my gcc on my machine!

That will depend on the version of gcc, and the target.  __int128 is
supported on targets that have reasonable hardware support for such big
integers.  Since 128-bit integers are very rarely used, there is little
point in going to the effort to provide convenient support on other
platforms - the gcc developers would have to spend vastly more work
making the support than anyone would save in using it.

> (b) it did it with some horrible syntax, rather than the stdint.h
> syntax uint128_t,
> just in order to be nonuniform, just in order to annoy me.

No, the type is named "__int128" because this follows the C standards.
The implementation is allowed to give it a name starting with two
underscores - they are not allowed to call it "int128_t" because that is
reserved for a standard integer type or an extended integer type of that
size.  __int128 does not qualify as an extended integer (though it has
many of the required properties) unless a target's "long long" is 128
bits in size (in which case there would probably be an int128_t typedef
in <stdint.h>).

> 
> Now actually 128 really is harder to do, but 1,2,4 are totally trivial
> for you to do,
> in fact were already done by gcc-pascal.

As explained above, that is completely wrong in all points.

> 
>>> e.g. an array of 1000 bools takes 8000 bits,
>>> which is asinine and kind of defeats the point.
>>
>> If you want 1000 boolean values then don't use 1000 bool variables, do
>> something like std::bitset or std::vector<bool> in C++, i.e. create an
>> array of some integer type and write functions for accessing
>> individual bits.
> 
> --If I did that, then I would need to program in C++.
> Gcc, however, is a C compiler, not a C++ compiler.

Wrong.  gcc is a compiler collection, supporting a number of languages -
including C and C++.

> Therefore, your answer is not an answer for anybody using C.

People who choose to use C need to program using C.  If they want a
packed bit array, they need to write support for one using macros,
access functions, etc.  It is not hard, and many people have done it.

People who choose to use C++ can take advantage of the ready-made
library support for packed bit collections.

Make your choice.

> 
> Also, yes I can, and sometimes have, written my own functions in C to allow
> use of packed nybbles.  Sure.  But that totally misses my point,
> which is, programers should not have to keep on reinventing that wheel;

You are talking about C programming.  C is a low level language, with
few features but an emphasis on making those features flexible and
efficient.  Reinventing the wheel is precisely that every C programmer
does, every day - they create finely tuned wheels that are crafted to
fit exactly their own particular use-case.  And when you need a big,
complex, or re-usable wheel, you use a library.

> It should be provided by the language so they do not have to.

The number of people that would have use of packed nibble arrays in C is
absolutely negligible compared to the number of people who could use
decent string handling with automatic memory management - but C does not
have that as part of the language.

> And even if they do, then they are forced to use different syntax for
> 4 versus for 8, which
> makes code much uglier and larger and buggier for no reason.
> 
> 
>>> I suggest adding uint1_t, uint2_t and uint4_t support
>>> to gcc, and packed.
>>
>> You can't have a variable of a single bit, because you can't address it.
> 
> --You can have a variable of a single bit, because you can address it.

No, you can't address a single bit (except on a few specialised
processors).  Rather than extrapolating your ignorance about Pascal to
ignorance about C, you would do better to listen to the people who
actually /write/ compilers (such as Jonathan, Andrew, Joseph, etc.), or
to people like me that know a good deal about processor architecture.

> 
> 
>> GCC already supports __int128 and __uint128 where possible.
> 
> --false. It does not support it on my machine.  And it is certainly "possible"
> to support it on my machine. It may be "inconvenient" but it certainly
> is possible.
> And note __int128 is a new syntax different from the stdint.h syntax int128_t.
> If you are going to provide a feature, why not provide it in the syntax
> you already chose to adopt, and the world already chose to standardize?
> 
> 
>>> It is obnoxious and arbitrary that only 8,16,32,64 are available,
>>
>> It's obviously not arbitrary, it's dictated by the hardware.
> 
> --the point of a high level language like C, is to allow programmer
> not to worry about
> specific hardware quirks on specific machines.

And C does a good job of finding a balance between target-independent
coding while only providing features that can be efficiently implemented
on a wide range of systems.

If you want higher level features, use a higher level language (C++
being a fine example).

> 
>> Patches welcome.
> 
> --the reason I am suggesting this to this forum, is I probably am not capable of
> recoding GCC myself.

Then listen to those that /are/ capable of working on gcc.

> 
> 
>>> 2. Gcc already provides a way to produce quotient and remainder
>>> simultaneously, producing a struct with two fields as output:
>>>    div_t x = div(a,b);  causes   x.quot = a/b, x.rem = a%b.
>>> Not a lot of people know gcc provides that, but it does, and that is
>>> good, because
>>> it provides access to the hardware capability.
>>
>> GCC doesn't provide that, the C library does, because it's defined by
>> standard C and has been for decades.
> 
> --I found this on gcc documentation web pages, e.g:
>       https://www.gnu.org/software/libc/manual/html_node/Integer-Division.html
> and some such page claimed gcc provides this as a builtin not as a
> library subroutine.

The C standard defines a number of standard library functions.  For some
of these, on some platforms, gcc can replace the library calls with
inlined code for efficiency.  On other targets, many basic C operations
result in library calls.  A C implementation (the combination of the
compiler and the standard library) is free to move that boundary as it
wants.

> 
> So we know you can provide such things as builtins.
> I am simply suggesting superior, more useful, builtins than this particular one.

The "div" function is useful to some people, though it is perhaps
outdated - modern compilers will usually be able to optimise discrete
uses of / and % as efficiently as a call to div().

> I have explained why mul would be more useful and more needed than div
> and why div was actually not even necessary to provide at all, but you and/or
> stdlib, nevertheless did provide it.  Given that you and /or stdlib
> have already decided to
> do something with usefulness 1, why not actually provide something
> that is as easy or easier to provide, and which has usefulness 7?

Handling arithmetic on integers greater than 64 bits is rarely useful.
But because it is /sometimes/ useful, gcc provides a range of builtin
functions for handling multi-precision arithmetic with overflows.

> 
> I also point out that, historically, one reason gcc caught on was it
> provided good extensions of the C language, not available in competing
> compilers.

gcc still does provide many useful extensions.  These days, however, the
developers prefer not to add new extensions without very good reason -
compatibility with other compilers is very important.

> 
> Why not learn from your own history, and do that again, with these two
> extensions?
> (And in the case of uint4_t, it actually would not even BE an
> "extension" since as I said,
> the standard already allows providing other sizes.)
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]