This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Extending Gcc For a New Language

From: Fergus Henderson <fjh at cs dot mu dot OZ dot AU>
To: Mike Stump <mrs at apple dot com>
Cc: Kevin Atkinson <kevin at atkinson dot dhs dot org>, gcc at gcc dot gnu dot org
Date: Fri, 7 Mar 2003 03:29:50 +1100
Subject: Re: Extending Gcc For a New Language
References: <Pine.LNX.4.44.0303051925580.912-100000@kevin-pc.atkinson.dhs.org> <1A45B9A2-4F81-11D7-A309-003065A77310@apple.com>

Kevin Atkinson wrote:
> 
> For my Ph D I am seriously considering designing a new System Program 
> language.  Unlike many other new languages, my new language will be 
> designed to be suitable for low-level programming tasks such as written 
> kernels and operating systems, It is designed to replace C and C++ 
> (wishful thinking I know).

A fair bit of work has been done in this area.
Have a look at Cyclone, for example:
<http://www.research.att.com/projects/cyclone/>.
It sounds like your goals are very similar to the goals of the people
involved in the Cyclone project, so perhaps you should team up with them!

> For the implementation I am considering two choices
> 
> 1) Writing the compiler in its own language that emits C (or perhaps 
> C++) code and then uses gcc to compile it.
> 
> 2) Extending Gcc to support the new language.

These two choices are not necessary mutually exclusive.
You can structure your compiler so that it generates an intermediate
language which resembles the subset of C (or C++) which you wish to use,
and then have two back-ends, one of which converts this intermediate
language to C or C++ (and then invokes gcc) and the other of which
interfaces with the GCC back-end infrastructure directly, by converting
this intermediate language to GCC trees, rather than going via C or C++.

I would generally strongly recommend choosing (1) initially, because
as well as being easier to implement, it will also be a lot easier to
debug; you can debug mistakes in the generated code at the level of C
code rather than at the level of assembly.  In fact because of this
debugging issue, I think doing (1) first and then (2) is probably easier
than just doing (2) alone.

> Because my language will offer features not currently supported by C or 
> C++ (and probably Java, Ada, and Fortune but I don't know enough about 
> those languages to be sure), it will me more than simply writing a new 
> front end.  Some of the features the language may offer:
> 
> * Type inference in the style of most functional programming languages, but 
> perhaps a bit more limited.  Generally global variables and function 
> parameters will have the types specified, but the compiler will be 
> expected to infer the types for local variables.
> 
> * No user written header files, instead the compiler will emit the 
> necessary information.  When no optimizations are used it will only emit 
> function phototypes and the like.  When using optimization it will emit 
> more such as function definitions for functions which are good inlining 
> candidates.

Those issues are purely a matter for the language front-end.
For those it doesn't matter which back-end infrastructure you use.
Using GCC for your back-end will neither help nor hinder here.

(Mercury supports both of those, as it happens.)

> * An optional garbage collector.  When active the collector will only be 
> used for some objects, AND the user will be allowed to free objects 
> explicitly. (I know has support for garbage collection but I don't know 
> how powerful it is and if it can handle objects being freed by the user)

The Boehm (et al) conservative collector does allow objects to be
explicitly freed.

> * Very precise typing of objects.  Types can be limited by arbitrary 
> boolean expressions such as limiting an integer to a particular range.  If 
> the compiler can not verify the conditions at compile time it is expected 
> to be able to optionally emit code to check for it at runtime.

This should be straight-forward.  You can use GNU C's `__builtin_expect'
extension (or the BUILT_IN_EXPECT builtin, if you are compiling directly
to GCC trees) to give GCC the hint that these type checks are expected
to succeed most of the time.

On 05-Mar-2003, Mike Stump <mrs at apple dot com> wrote:
> On Wednesday, March 5, 2003, at 04:49 PM, Kevin Atkinson wrote:
> >  Some of the features the language may offer:
> >
> >* Type inference in the style of most functional programming 
> >languages, but
> >perhaps a bit more limited.  Generally global variables and function
> >parameters will have the types specified, but the compiler will be
> >expected to infer the types for local variables.
> 
> cp/pt.c  It is a purely frontend issue.  You can write what ever 
> arbitrary code you want.
> Would be nice to unify and push into the midend, but that hasn't been 
> done, so you would be writing all your own code, from scratch in the 
> frontend.  if you can express a way that hooks into the backend, love 
> to hear it, I am unaware of any issues.

The reason *not* to put type inference into the GCC middle-end
infrastructure is that every language which does type inference does so
differently.  Anyway, the front-end should be responsible for checking
that the input is type-correct and issuing error messages if it is not.
The middle-end should almost never issue error messages; it doesn't
have enough information about the source language to issue good ones.

However, there are certainly some things which could be added to the
GCC middle-end infrastructure to make it more useful for supporting
modern programming languages.  One of them is discriminated union
types (also known as algebraic types).  Another is generic types.
ILX <http://research.microsoft.com/projects/ilx/ilx.htm> is a good source
of ideas here.

> Put another way, by the time you lower to RTL, all notion of type 
> disappears.  In RTL, you have modes, that's about it.

Sure, but the front-end interface is to trees, not RTL.

> >* An optional garbage collector.
> 
> Like, say java, C or C++, been there, done that, next.

Yes.  However, you'll need to use a conservative collector, or pay
a potentially significant performance cost.  Even with the conservative
collector, the performance probably won't be great compared to what could
be achieved using a good native code generation framework that supported
type-accurate collection.  On the other hand, there simply aren't any
good native code generation frameworks around that support type-accurate
GC and have anything like GCC's level of portability and developer base.

For details on this, see my paper on doing type-accurate GC with GCC:
Fergus Henderson, "Accurate garbage collection in an uncooperative
environment".  Proceedings of the 2002 International Symposium
on Memory Management, Berlin, Germany, June 2002, pages 150-156.
<http://www.cs.mu.oz.au/research/mercury/information/papers.html#high_level_gc>.

If you are doing this as part of a thesis, and the thesis is not about
garbage collection or memory management, then I would definitely recommend
just using conservative GC.  It's a *lot* easier to implement.  Just plug
in the Boehm et al collector and away you go.

> Now, since I've not seen the one person chime in yet who I think you 
> should really listen to; let me prompt him to step forward, he's 
> written such a frontend and experienced the fun of doing it both ways 
> as I recall.  Try google with Mercury gcc frontend Fergus language 
> generate C code, and see what you get, just in case he's written up his 
> experiences for you already.

I haven't yet gotten around to writing much about my experiences with
targetting the GCC back-end.  Actually I was thinking of maybe doing a
paper for the upcoming GCC Developers Summit, but that will depend on
whether I can organize travel funding.  Also, I have about a million
other things to do...

-- 
Fergus Henderson <fjh at cs dot mu dot oz dot au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

Follow-Ups:
- Re: Extending Gcc For a New Language
  - From: Kevin Atkinson

References:
- Extending Gcc For a New Language
  - From: Kevin Atkinson
- Re: Extending Gcc For a New Language
  - From: Mike Stump

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]