This is the mail archive of the
mailing list for the GCC project.
10 Nov notes from GCC improvement for Itanium conference call
- From: "Mark K. Smith" <mksmith at gelato dot org>
- To: "Gelato-GCC" <gelato-gcc at gelato dot unsw dot edu dot au>,"GCC" <gcc at gcc dot gnu dot org>
- Date: Tue, 15 Nov 2005 12:59:38 -0600
- Subject: 10 Nov notes from GCC improvement for Itanium conference call
- Reply-to: <mksmith at gelato dot org>
ON THE CALL: Shin-ming Liu (HP), Vladimir Makarov (Red Hat), Diego
Novillo (Red Hat), Mark Smith (Gelato), Andrey Belevantsev (RAS),
Arutyun Avetisyan (RAS), Bob Kidd (UIUC), Mark Davis (Intel)
The call covered:
1. Setting up GCC branch for Itanium-related work
2. Alias analysis update from RAS and Diego
3. Superblock update from UIUC
4. HP update from Shin
4. Scheduler work from RAS
Mark S. will work on securing a Montecito SDV to help test GCC builds.
Information about submitting proposals to ISA for GCC work was
distributed to the group. Additional call details can be found below.
NEXT MEETING: December 8th, 2005. Details will be emailed out prior to
We are working on the first part of the new scheduler infrastructure.
This part is a set of routines that gather the instructions available
for scheduling. We'd put this on the ia64 branch as soon as this would
be ready, hopefully in a couple of weeks.
I've tested the tree prefetching pass from the killloop-branch,
written by Zdenek Dvorak some time ago. At that time, the branch
wasn't stable enough, so SPEC INT results didn't show much. But the
pass is a win for SPEC FP tests (8 of 14 tests worked, ~10% mean, from
6% to 40%).
I've also tested the killloop branch together with the speculation
patch, and the speedups when compared to the pristine branch are more
or less the same (as when comparing head+speculation against head)
for SPEC FP. The prefetching though helps the speculation for SPEC INT
tests (bzip2 speedup is increased by 6%). Probably it's time to redo
this testing, because the killloop branch has significantly changed
We'll send the alias propagation patch to the list in a few days. The
patch will be put on the ia64 branch then, as well as the other ready
patches. We've got the confirmation from the FSF for three of us who
participated in the last project (Maxim Kuvyrkov, Dmitry Melnik and
myself). So to start working on the branch we'll need just the auth
tokens in the repository.
There's not much new to report on the Superblock scheduling work. I
received a cleaned version of the patch from Steven Bosscher, but I
haven't had time to look at it yet. I'm working with Diego to set up
a branch from the FSF tree and will look into setting up a machine to
periodically do performance regressions.
HP has done a few things in the past month. We packaged and tested
GCC4.0.2 for HP-UX and the bits should be ready for posting shortly.
We also start working with GCC community looking into the creation of
binary IR files for GCC. It helps the IPO effort in GCC.
I told about importance of early access to machines based on new
Itanium chip (Montecito) and documentation for gcc developers trying
to improve gcc for Itanium.
As for Mark Davis remark about rewriting RTL optimizations, I told
that it can not be done easily.
RTL is too complicated. On Andrew Macleod and my etsimation only
writing a new good register allocator is at least 2 years project.
Significtant simplifying rtl or usage another IR is even more
complicated task than introducing Tree-SSA because machine description
is very tied to RTL.
As an example, combine pass is based on outdated work of Fraser,
Proebsting etc. They proposed to combine several intermidiate data
dependent insns into one machine insn with possible insn spliting
(that what define_split pattern serves in machine desc). Since that
work they proposed fast and optimal solution of code selection task
with their BURG and IBURG system (finding minimal cost cover of tree
expression by machine insn patterns). Moving to this algorithm needs
significant simplifying RTL (one rtl insn should be no more than part
of machine insn or just one machine insn). That means rewritting all
machine desc files (simplier define_insn), removing define_split
patterns, rewriting other optimizations (e.g. reload assumes that all
moves/stores/loads of one mode is described by one define_insn
My focus over the last few weeks has been OpenMP and fixing bugs for
the 4.1 release. I will create a branch so that folks can put their
work in it. I need people to mail me so that I can coordinate write
access with them.
We discussed briefly some of the activity geared towards improving
GCC's backend. The problem is well understood and various relatively
independent efforts are moving GCC in the right direction (IPA,
dataflow analysis improvements, scheduling, register allocation, move
high-level aliasing information into the backend). It will probably
take a few releases to fix most of the glaring problems.