Review of the second addition of the "Dragon Book".

Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman. Compilers. Principles, Techniques, and Tools.

Addison Wesley; 2nd ed. (August 2006)

Originally the book was to be called "21st century compilers". But probably it is too ambitious name. Currently Amazon is selling the 1st edition plus online chapters from the 2nd edition. There is an additional author Monica Lam, a professor of Standford University (knowing her track of the researches I can guess that she probably wrote the two best chapters about parallelism and locality). Chapters 5-11 were accessible starting Nov. 15 till Dec. 31 2005 online through flash to prevent copying the book. It was hard to read it online and flash permitted to print one page of very bad quality at a time. Reading the chapters, I can guess that chapters prior 5 will be without changes taken from the 1st edition of the Dragon book.

Chapter 5, Syntax directed translation, is pretty the same chapter as in the 1st edition describing different attribute grammars to make syntax trees, expression DAGs (with collapsed sub-expression) and other translation. I did not find considerable changes in the chapter. It is not the rocket science but probably it is the most abstract and general description of the translation pass in compiler books.

Chapter 6, Intermediate code generation, is actually more compact and modern version of two chapters 6 (type checking) and 8 (intermediate code generation) from the 1st edition. Some obsolete parts are removed and new concepts are added. It makes this chapter is better than in the two ones in the 1st edition. Big drawback is just mentioning SSA. IMHO It needs a deeper introduction in algorithms of conversion to/from SSA. So Morgan's book (Building an optimizing compiler) and Cooper's and Torczon's book (Engineering a compiler) are much better description of SSA.

This is the first edition where GCC is mentioned (as an example of compiler using the same IR for implementation of many languages).

Chapter 7, Runtime environment, is mainly more compact version of the one in the 1st edition but it uses C++ and ML as example instead of Pascal and Fortran in the 1st edition. Additionally the chapter contains a big new part (40 pages) about garbage collection algorithms:

It also discusses several approaches to short-pause GCs and parallel and concurrent GCs in details. In whole, the description of GC is a bit better than another good description of GCs in Appel's book (Modern Compiler Implementation in C/Java/ML).

Chapter 8, Code generation, is pretty the same chapter 9 (code generation) from the 1st edition with small changes like mentioning Java and Jit. It describes mainly target machines, storage layout and allocation, code selection task (in a good and general way as pattern matching algorithm), register allocation (a very brief and pathetic description), peephole optimization in usual ad hoc way, and simple local optimizations

There is some intersection with subsequent chapter (about CFG and loop notions, etc).

Chapter 9, Machine-independent optimizations, is an improved version of chapter 10 (code optimization) in the previous edition. Data flow analysis is given in modern formalized way (lattice theory -- alternatively I would also recommend Copper's and Toczon's book as an introduction in the theory). Also more detail description of PRE is added to the chapter.

The last two chapters are absolutely new and they are very good.

chapter 10. Instruction-level Parallelism chapter 11. Optimizing for Parallelism and Locality.

They are the best introduction in these fields what I saw in Morgan's (Building an optimizing compiler), Muchnick's (Advanced compiler design and implementation), Cooper's and Torczon's (Engineering a compiler), and Allen's and Kennedy's (Optimizing compilers for modern architectures) books. Although on my opinion many issues in chapter "Optimizing for parallelism and locality" are described deeper in Allen's and Kennedy's book.

Chapter about Instruction-Level parallelism mainly discusses insn scheduling and software pipelining but mentions predication and prefetching too. Insn scheduling and its conflict with the register allocation is discussed on BB insn scheduling, dominator based scheduling and region based scheduling. Software pipelining is described in details on example of modulo scheduling including variable expansion, if-reduction, register pressure decreasing, and rotating register support.

A pretty big chapter (130 pages) about optimizing for parallelism is devoted to improving data locality and parallelizing loops (simultaneous execution of the loop on several CPU) with affine array accesses. It needs data dependency analysis. The chapter describes classical GCD test (greatest common division), heuristics to handle large numbers of inequalities met in practice, and a specialized linear integer programming solver. Data from the data dependencies analysis is used for improving locality and parallelizing loops. Locality can be improved by changing data layout and by blocking (process arrays part by part or by *blocks* which can be fully placed in cache). The chapter describes that in details. A half of the chapter describes parallelizing loops (in other words execution of a loop on several CPU) without and with synchronization through pipelining (decomposing a task on stages executed on different processors).

In general, 2nd edition is not the best description of all compiler tasks and problems (some of them are pretty weak) but parts of them (garbage collection, optimization for locality and using modern processors parallelism) makes sense to have it on the bookshelf. But personally I am not going to buy this book.


This text was taken from a post from Vladimir N. Makarov to the gcc mailing list on Fri, 09 Mar 2007 and formatted for this wiki by Michael Cieslinski.

None: Review_of_the_second_addition_of_the_Dragon_Book. (last edited 2008-01-10 19:38:38 by localhost)