111312 – (analyzer-run-earlier) Should the analyzer run earlier?

Bug 111312 (analyzer-run-earlier) - Should the analyzer run earlier?

Summary: Should the analyzer run earlier?

Status:	UNCONFIRMED

Alias:	analyzer-run-earlier

Product:	gcc
Classification:	Unclassified
Component:	analyzer (show other bugs)
Version:	unknown

Importance:	P3 normal
Target Milestone:	---
Assignee:	David Malcolm

URL:
Keywords:

Depends on:	100116 108028 108767
Blocks:	111095 111213
	Show dependency tree / graph

Reported:	2023-09-06 21:23 UTC by David Malcolm
Modified:	2024-02-16 21:25 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Malcolm 2023-09-06 21:23:27 UTC

I made the analyzer run when it does in order to take advantage of the LTO streaming representation.

But:
  I'm having to recommend disabling optimizations for various -fanalyzer warnings:
    * -Wanalyzer-deref-before-check
    * -Wanalyzer-infinite-recursion
    * -Wanalyzer-tainted-assertion
  and eventually:
    * -Wanalyzer-infinite-loop (work-in-progess; see bug 106147)

Also, various bugs are showing up where the analyzer fails to warn on clearly wrong code (presumably due to the optimizer removing code containing undefined behavior before the analyzer ever sees it; see e.g. bug 111095 and bug 111213).

Also: there's a tension between warnings and optimization: the optimizer takes advantage of undefined behavior (assuming it's not present), but we want to complain about the presence of undefined behavior.

Plus it's better to report things to the user in a form closer to that in which they wrote the code.

When should we run?
  * immediately after we reach "generic"
    * could give us per-argument locations at call sites
  * once we're in gimple-cfg (but no ssa?)
  * once we're in gimple-ssa

Costs/Benefits of running at different times
  * is inlining saving us?
  * do we want to use the callgraph?
  * do we want to use the loop information? (see bug 109252)
  * do we want to use ranger? (need SSA)
  * do we want to use the CFGs?
  * do we want to reuse ipa-devirt?
  * presumably we don't want to reimplement lowering of OpenMP, exceptions, etc
  * PR analyzer/100116 ("analyzer event messages for conditionals have the sense of the gimple IR rather than the source")

We're not currently analyzing the user's code, we're analyzing the user's code after the optimizer has manipulated it, taking advantage of undefined behavior.

Comment 1 David Malcolm 2023-09-11 15:54:20 UTC

Richi: IIRC we chatted about this at last year's Cauldron, and wonder if you have any thoughts on this  (and I'm very much looking forward to your "Undefined behaviour and its treatment within GCC" Cauldron talk this year).

Comment 2 Richard Biener 2023-09-12 11:21:54 UTC

I think the analyzer runs at the "correct" place as a regular IPA pass which makes it possible for it to see the whole program (with -flto).

As with any of our late diagnostic passes there's trade-off when optimizing
less or more.

It should be possible to use -fanalyzer -flto -O0, correct?

The main question to me is whether -fanalyzer is supposed to be a
static analyzer only operation, thus the produced object files are really
an unwanted artifact.  If not, and -fanalyzer should be useful for full
release builds then I think -fanalyzer vs. -fno-analyzer shouldn't have
any effect on code generation (much like diagnostic options), but then
one might argue splitting analyzer into two phases, one early, for example
after SSA rewrite where most early diagnostic passes reside, and one
late, where it currently resides, would be appropriate to divert analyses
prone to false positives/negatives when run on optimized code early.  But
then duplicate diagnostics would have to be suppressed somehow (or the
result processed by tooling which could even correlate early/late findings).

Comment 3 David Malcolm 2023-09-15 21:44:35 UTC

Another example can be seen here:
  https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628759.html
in:
  gcc/testsuite/c-c++-common/analyzer/overlapping-buffers.c
where -Wanalyzer-overlapping-buffers only catches cases that the optimizer doesn't see; others it misses due to them being optimized away due to undefined behavior.