This is the mail archive of the
mailing list for the GCC project.
Re: [RFC] GCC caret diagnostics
On Mar 12, 2008, at 11:21 PM, Manuel López-Ibáñez wrote:
On 13/03/2008, Chris Lattner <firstname.lastname@example.org> wrote:
There is no right answer, and this topic has been the subject of much
debate on the GCC list in the past. I really don't care to debate
merits of one approach vs the other with you, I just answered your
question about what clang does.
Of course. I didn't want to get into a debate. I was just trying to
figure out how clang addressed the problem of false positives (or
alternatively the problem of users complaining about them). Thanks for
taking the time to elaborate.
I won't claim that we have a lot of "in the trenches" experience with
this. Clang is still quite immature and we've been focusing mostly on
source-level analysis/transformation stuff so far (with a small amount
of JITing C code), and not using it much as a drop-in static compiler
yet. As such, it hasn't been foisted on tons of unsuspecting users
(*yet*, <evil laugh>), so they haven't had time to complain. That
said, clang has successfully parsed and type checked millions of lines
of code, so the front-end is in good shape - we just haven't put much
emphasis on the codegen-through-llvm component yet.
That said, despite much experience, I strongly believe this is the
right approach. If you take the example you pasted, even if the
compiler emits a false positive, it's obvious to the user *why* the
compiler thinks it is a bug, and they can disable the warning in a
trivial way that doesn't affect codegen (the 'int x = x;' hack).
The problem with the current GCC approach is that false positives are
often really bizarre and difficult to understand. Ones we have hit in
LLVM (i.e. when building the LLVM codebase with GCC) have to do with
multiple levels of inlining, etc. This is *not* a good end-user
I have no opinion about the approach that you take in GCC. In
practice, we have been able to do this analysis very quickly and get
good results, and will continue to refine them as clang continues to
Hum, that is very interesting because doing these warnings in the
middle-end causes so many false positives/negatives anyway that
perhaps a limited static analysis in the front-end achieves better
results. And the only argument then would be the overhead of the
static analysis. If that is not an issue either, then it seems a very
attractive approach indeed.
The design of clang makes these things potentially cheaper to compute
than GCC, so I can't claim that it will be 'cheap' to do in GCC.
However, if you give up on the "goal" of providing a perfect answer,
you can get good results with low cost. For example, there is no need
to build SSA for this stuff. Bitvector dataflow is a proven and
extremely cheap way to compute these properties. It's hard to beat
doing analysis of 64 variables in parallel with one "or" instruction :).
The real disadvantage of clang doing this is that it requires
duplicating infrastructure in the front-end. This means we have to
build source level CFGs and provide source level dataflow, etc. The
compile-time cost of this is not high, but it does take engineering
effort to implement this. As it turns out, this same infrastructure
is needed for other interesting clients (refactoring, static analysis,
etc) so we needed it anyway.
Do you use the static analysis for other things (other warnings, early
optimizations/folding)? There are many other warnings that would be
noticeably enhanced by some dataflow in the front-end.
All optimizations are done by the optimizer, the only optimizations we
do in the front-end are the language-required folding of constant
expressions. Clang does have a static analysis engine in development
(which uses a lot of this infrastructure), and it can do things like
report path sensitive ref counting bugs, null pointer deferences etc.
However, this is not enabled by default as you compile code, it is a
I personally think that it is a major problem that GCC doesn't
these diagnostics unless optimizations are enabled, and I continue to
think that having diagnostics change depending on what optimization
level is enabled is bad.
I think everybody agrees on both. Yet the alternatives to these issues
right now in GCC are building SSA at -O0 and moving the warnings
earlier in the pipeline, respectively. The former results in
slower/bigger -O0 compiler. Both would generate much more false
positives, and there have been a few patches proposing moving them
even later to avoid some false positives.
I guess the static analysis in the front-end was discarded because of
compile-time concerns, but if you don't see that (and clang seems to
be much faster than GCC), then it may be worth reconsidering it.
Clang's architecture is quite different than GCC's: what makes sense
for it does not necessarily transplant into GCC. Also, clang is
already several times faster than GCC, so adding, say, a 10% slowdown
(this is a random number I pulled out of the air, I haven't measured
the compile-time hit) to get decent warnings is acceptable for us.
That said, lots of people would be happy with better dataflow warnings
from GCC, so it certainly is a worthwhile project!
Engineering a compiler is really about balancing a huge set of
conflicting trade-offs. There usually isn't a "right" answer, these
issues depends on a lot of context. In the clang project, we're
intentionally designing it to be as friendly to the end user as
possible. Because GCC's front-end is so slow, it gives us the ability
to do a lot of user-centric things (e.g. the expressive diagnostic
stuff, dataflow warnings, etc) better and still remain significantly
faster than GCC.