This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] GCC caret diagnostics


On Mar 12, 2008, at 11:21 PM, Manuel López-Ibáñez wrote:
On 13/03/2008, Chris Lattner <clattner@apple.com> wrote:
There is no right answer, and this topic has been the subject of much
debate on the GCC list in the past. I really don't care to debate the
merits of one approach vs the other with you, I just answered your
question about what clang does.

Of course. I didn't want to get into a debate. I was just trying to figure out how clang addressed the problem of false positives (or alternatively the problem of users complaining about them). Thanks for taking the time to elaborate.

I won't claim that we have a lot of "in the trenches" experience with this. Clang is still quite immature and we've been focusing mostly on source-level analysis/transformation stuff so far (with a small amount of JITing C code), and not using it much as a drop-in static compiler yet. As such, it hasn't been foisted on tons of unsuspecting users (*yet*, <evil laugh>), so they haven't had time to complain. That said, clang has successfully parsed and type checked millions of lines of code, so the front-end is in good shape - we just haven't put much emphasis on the codegen-through-llvm component yet.


That said, despite much experience, I strongly believe this is the right approach. If you take the example you pasted, even if the compiler emits a false positive, it's obvious to the user *why* the compiler thinks it is a bug, and they can disable the warning in a trivial way that doesn't affect codegen (the 'int x = x;' hack).

The problem with the current GCC approach is that false positives are often really bizarre and difficult to understand. Ones we have hit in LLVM (i.e. when building the LLVM codebase with GCC) have to do with multiple levels of inlining, etc. This is *not* a good end-user experience IMO.

I have no opinion about the approach that you take in GCC.  In
practice, we have been able to do this analysis very quickly and get
good results, and will continue to refine them as clang continues to
mature.

Hum, that is very interesting because doing these warnings in the middle-end causes so many false positives/negatives anyway that perhaps a limited static analysis in the front-end achieves better results. And the only argument then would be the overhead of the static analysis. If that is not an issue either, then it seems a very attractive approach indeed.

The design of clang makes these things potentially cheaper to compute than GCC, so I can't claim that it will be 'cheap' to do in GCC. However, if you give up on the "goal" of providing a perfect answer, you can get good results with low cost. For example, there is no need to build SSA for this stuff. Bitvector dataflow is a proven and extremely cheap way to compute these properties. It's hard to beat doing analysis of 64 variables in parallel with one "or" instruction :).


The real disadvantage of clang doing this is that it requires duplicating infrastructure in the front-end. This means we have to build source level CFGs and provide source level dataflow, etc. The compile-time cost of this is not high, but it does take engineering effort to implement this. As it turns out, this same infrastructure is needed for other interesting clients (refactoring, static analysis, etc) so we needed it anyway.

Do you use the static analysis for other things (other warnings, early
optimizations/folding)? There are many other warnings that would be
noticeably enhanced by some dataflow in the front-end.

All optimizations are done by the optimizer, the only optimizations we do in the front-end are the language-required folding of constant expressions. Clang does have a static analysis engine in development (which uses a lot of this infrastructure), and it can do things like report path sensitive ref counting bugs, null pointer deferences etc. However, this is not enabled by default as you compile code, it is a stand-alone tool.


I personally think that it is a major problem that GCC doesn't produce
these diagnostics unless optimizations are enabled, and I continue to
think that having diagnostics change depending on what optimization
level is enabled is bad.

I think everybody agrees on both. Yet the alternatives to these issues right now in GCC are building SSA at -O0 and moving the warnings earlier in the pipeline, respectively. The former results in slower/bigger -O0 compiler. Both would generate much more false positives, and there have been a few patches proposing moving them even later to avoid some false positives.

I guess the static analysis in the front-end was discarded because of
compile-time concerns, but if you don't see that (and clang seems to
be much faster than GCC), then it may be worth reconsidering it.

Clang's architecture is quite different than GCC's: what makes sense for it does not necessarily transplant into GCC. Also, clang is already several times faster than GCC, so adding, say, a 10% slowdown (this is a random number I pulled out of the air, I haven't measured the compile-time hit) to get decent warnings is acceptable for us. That said, lots of people would be happy with better dataflow warnings from GCC, so it certainly is a worthwhile project!


Engineering a compiler is really about balancing a huge set of conflicting trade-offs. There usually isn't a "right" answer, these issues depends on a lot of context. In the clang project, we're intentionally designing it to be as friendly to the end user as possible. Because GCC's front-end is so slow, it gives us the ability to do a lot of user-centric things (e.g. the expressive diagnostic stuff, dataflow warnings, etc) better and still remain significantly faster than GCC.

-Chris


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]