93355 – Analyzer emits -Wanalysis-too-complex on intl/localealias.c due to poor call summarization

Bug 93355 - Analyzer emits -Wanalysis-too-complex on intl/localealias.c due to poor call summarization

Summary: Analyzer emits -Wanalysis-too-complex on intl/localealias.c due to poor call ...

Status:	ASSIGNED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	analyzer (show other bugs)
Version:	10.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	David Malcolm

URL:
Keywords:

Depends on:
Blocks:	analyzer-call-summaries
	Show dependency tree / graph

Reported:	2020-01-21 14:41 UTC by David Malcolm
Modified:	2021-03-12 20:11 UTC (History)
CC List:	1 user (show)

See Also:	47170
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2020-01-21 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Malcolm 2020-01-21 14:41:14 UTC

Bug 47170 reports a leak of a FILE * in gcc/intl/localealias.c found by cppcheck.  This appears to be a genuine leak.

-fanalyzer does not currently report this but ought to, so I'm filing this bug to track this.

Comment 1 David Malcolm 2020-02-04 14:18:20 UTC

Code is actually in intl/localealias.c

Comment 2 GCC Commits 2020-09-21 22:49:52 UTC

The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:

https://gcc.gnu.org/g:15e7b93ba4256884c90198c678ed7eded4e73464

commit r11-3340-g15e7b93ba4256884c90198c678ed7eded4e73464
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Fri Sep 18 17:34:50 2020 -0400

    analyzer: decls are not on the heap
    
    Whilst debugging the remaining state explosion in PR analyzer/93355
    I noticed that half of the states at an exploding program point had:
      'malloc': {'&buf': 'non-heap'}
    whereas the other half didn't, presumably depending on whether the path
    to each enode had used this local buffer:
      char buf[400];
    
    This patch tweaks malloc_state_machine::get_default_state to be smarter
    about this, so that we can implicitly treat pointers to decls as
    non-heap, preventing pointless differences between sm_state_map
    instances.  With that, all of the states in question have equal (empty)
    malloc sm-state - though the state explosion continues for other reasons.
    
    gcc/analyzer/ChangeLog:
            PR analyzer/93355
            * sm-malloc.cc (malloc_state_machine::get_default_state): Look at
            the base region when considering pointers.  Treat pointers to
            decls as being non-heap.

Comment 3 GCC Commits 2020-09-24 01:26:32 UTC

The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:

https://gcc.gnu.org/g:6b828f69519a50e6e2961b62ea552bf89d287199

commit r11-3415-g6b828f69519a50e6e2961b62ea552bf89d287199
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Thu Dec 19 16:15:09 2019 -0500

    analyzer: add testcases for PR 93355 (intl/localealias.c leak)
    
    PR analyzer/93355 reports a missing diagnostic about a FILE leak in
    intl/localealias.c.  This appears to be due to a issue in the
    feasibility-checking code, though there is also a state explosion.
    
    This patch adds test cases that I've been using when investigating this,
    two of them currently requiring -fno-analyzer-feasibility, and one
    currently requiring -Wno-analyzer-too-complex.
    
    gcc/testsuite/ChangeLog:
            PR analyzer/93355
            * gcc.dg/analyzer/pr93355-localealias-feasibility.c: New test.
            * gcc.dg/analyzer/pr93355-localealias-simplified.c: New test.
            * gcc.dg/analyzer/pr93355-localealias.c: New test.

Comment 4 David Malcolm 2021-01-26 21:33:37 UTC

Current status is that there is testcase coverage for this in git, but the test requires:
    /* { dg-additional-options "-Wno-analyzer-too-complex -fno-analyzer-feasibility" } */

(a) It happens to successfully explore enough of the graph to find the leak, but hits complexity limits at 1 program point:

pr93355-localealias.c:263:41: warning: terminating analysis for this program point: callstring: [(SN: 66 -> SN: 85 in _nl_expand_alias)] before (SN: 21 stmt: 0):  _7 = *cp_107;EN: 163, EN: 170, EN: 177, EN: 370, EN: 557, EN: 794, EN: 801, EN: 808 [-Wanalyzer-too-complex]
  263 |       while (isspace ((unsigned char) cp[0]))
      |                                       ~~^~~

(b) Without -fno-analyzer-feasibility, it falsely rejects the path as infeasible at the "locale_alias_path[0] != '\0'" within _nl_expand_alias:

    |  185 |       added = 0;
    |      |       ~~~~~~~~~
    |      |             |
    |      |             (3) ...to here
    |  186 |       while (added == 0 && locale_alias_path[0] != '\0')
    |      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                         |                   |
    |      |                         |                   (5) ...to here
    |      |                         (4) following ‘true’ branch (when ‘added == 0’)...
    |      |                         (6) following ‘true’ branch...
    |
  ‘_nl_expand_alias’: event 7
    |
    |cc1:
    | (7): ...to here
    |
  ‘_nl_expand_alias’: events 8-9
    |
    |  198 |           if (start < locale_alias_path)
    |      |               ~~~~~~^~~~~~~~~~~~~~~~~~~
    |      |                     |
    |      |                     (8) this path would have been rejected as infeasible at this edge: edge from EN: 88 to EN: 90; rejected constraint: INIT_VAL((*INIT_VAL(locale_alias_path))) == (const char)0;

    where region: {start_39, value: INIT_VAL(locale_alias_path)}
    which looks like an instance of PR 96374.


(c) Also, the path it reports for the leak is rather verbose: it reports various events in _nl_expand_alias leading to the call into read_alias_file, but it really ought to just report the events in read_alias_file.  diagnostic_manager ought to prune leading interprocedural events (I think I had code for this in one of my working copies)

Comment 5 GCC Commits 2021-02-02 02:53:56 UTC

The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:

https://gcc.gnu.org/g:f2f639c4a781016ad146d44f463714fe4295cb6e

commit r11-7028-gf2f639c4a781016ad146d44f463714fe4295cb6e
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Mon Feb 1 21:52:41 2021 -0500

    analyzer: add more feasibility test cases [PR93355,PR96374]
    
    This patch adds a couple more reduced test cases derived from the
    integration test for PR analyzer/93355.  In both cases, the analyzer
    falsely rejects the buggy code paths as being infeasible due to
    PR analyzer/96374, and so the tests are marked as XFAIL for now.
    
    gcc/testsuite/ChangeLog:
            PR analyzer/93355
            PR analyzer/96374
            * gcc.dg/analyzer/pr93355-localealias-feasibility-2.c: New test.
            * gcc.dg/analyzer/pr93355-localealias-feasibility-3.c: New test.

Comment 6 GCC Commits 2021-02-02 02:55:23 UTC

The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:

https://gcc.gnu.org/g:8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86

commit r11-7029-g8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Mon Feb 1 21:54:11 2021 -0500

    analyzer: directly explore within static functions [PR93355,PR96374]
    
    PR analyzer/93355 tracks that -fanalyzer fails to report the FILE *
    leak in read_alias_file in intl/localealias.c.
    
    One reason for the failure is that read_alias_file is marked as
    "static", and the path leading to the single call of
    read_alias_file is falsely rejected as infeasible due to
    PR analyzer/96374.  I have been attempting to fix that bug, but
    don't have a good solution yet.
    
    Previously, -fanalyzer only directly explored "static" functions
    if they were needed for call summaries, instead forcing them to
    be indirectly explored, but if we have a feasibility bug like
    above, we will fail to report any issues in a function that's
    only called by such a falsely infeasible path.
    
    It now seems wrong to me to reject directly exploring static
    functions: even if there is currently no way to call a function,
    it seems reasonable to warn about bugs within them, since
    otherwise these latent bugs are a timebomb in the code.
    
    Hence this patch reworks toplevel_function_p to directly explore
    almost all functions, working around these feasiblity issues.
    It introduces a naming convention that "__analyzer_"-prefixed
    function names don't get directly explored, since this is
    useful in the analyzer's DejaGnu-based tests.
    
    This workaround gets PR analyzer/93355 closer to working, but
    unfortunately there is a second instance of PR analyzer/96374
    within read_alias_file itself which means even with this patch
    -fanalyzer falsely rejects the path as infeasible.
    
    Still, this ought to help in other cases, and simplifies the
    implementation.
    
    gcc/analyzer/ChangeLog:
            PR analyzer/93355
            PR analyzer/96374
            * engine.cc (toplevel_function_p): Simplify so that
            we only reject functions with a "__analyzer_" prefix.
            (add_any_callbacks): Delete.
            (exploded_graph::build_initial_worklist): Update for
            dropped param of toplevel_function_p.
            (exploded_graph::build_initial_worklist): Don't bother
            looking for callbacks that are reachable from global
            initializers.
    
    gcc/testsuite/ChangeLog:
            PR analyzer/93355
            PR analyzer/96374
            * gcc.dg/analyzer/conditionals-3.c: Add "__analyzer_"
            prefix to support subroutines where necessary.
            * gcc.dg/analyzer/data-model-1.c: Likewise.
            * gcc.dg/analyzer/feasibility-1.c (called_by_test_6a): New.
            (test_6a): New.
            * gcc.dg/analyzer/params.c: Add "__analyzer_" prefix to support
            subroutines where necessary.
            * gcc.dg/analyzer/pr96651-2.c: Likewise.
            * gcc.dg/analyzer/signal-4b.c: Likewise.
            * gcc.dg/analyzer/single-field.c: Likewise.
            * gcc.dg/analyzer/torture/conditionals-2.c: Likewise.

Comment 7 David Malcolm 2021-02-02 14:38:29 UTC

(In reply to CVS Commits from comment #6)
> The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86
> 
> commit r11-7029-g8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86

This patch probably also fixes (c) within comment #4 above.

Comment 8 David Malcolm 2021-03-04 00:27:52 UTC

(In reply to David Malcolm from comment #4)
> (a) It happens to successfully explore enough of the graph to find the leak,
> but hits complexity limits at 1 program point:
> 
> pr93355-localealias.c:263:41: warning: terminating analysis for this program
> point: callstring: [(SN: 66 -> SN: 85 in _nl_expand_alias)] before (SN: 21
> stmt: 0):  _7 = *cp_107;EN: 163, EN: 170, EN: 177, EN: 370, EN: 557, EN:
> 794, EN: 801, EN: 808 [-Wanalyzer-too-complex]
>   263 |       while (isspace ((unsigned char) cp[0]))
>       |                                       ~~^~~

If I get rid of _nl_expand_alias then the state explosion goes away.
There are 2 enodes at the callsite to read_alias_file, so the entrypoint starts with 3 enodes (both calls, plus independent analysis)
Hence there's already a 3x for that function.
Without that 3x, things seem reasonable within read_alias_file.
So (a) may be really "just" a call summarization problem.

Comment 9 David Malcolm 2021-03-05 23:54:14 UTC

Going back to the summary from comment #4:
(a) It happens to successfully explore enough of the graph to find the leak, but hits complexity limits at a program point
(b) Without -fno-analyzer-feasibility, it falsely rejects the path as infeasible
(c) is effectively fixed by r11-7029-g8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86.

The issue with (a) is that we have a non-trivial function calling a non-trivial function, and the lack of call summaries means that we get a state explosion due to expanding the callee 3 times (once directly, and twice at the single callsite, exploded in the callee).  The callee has an issue that is found standalone (since r11-7029-g8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86).  Hence the state explosion on nested calls means that we won't fail to find the standalone issue.  So although it would be good to fix the state explosion/call summary issue, I think it's lower priority than fixing the feasibility issue.  (I spent some time this week looking at call summaries, but now think this is lower priority than the feasibility issue).

For (b), I've tried several approaches, but now that we're trying multiple paths per dedupe candidate, it's hard to see where things go wrong.  I've been experimenting with better ways to visualize this to try to make debugging it less painful.

Comment 10 GCC Commits 2021-03-11 22:48:40 UTC

The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:

https://gcc.gnu.org/g:3857edb5d32dcdc11d9a2fe3ad7c156c52a1ec7f

commit r11-7640-g3857edb5d32dcdc11d9a2fe3ad7c156c52a1ec7f
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Thu Mar 11 17:46:37 2021 -0500

    analyzer: new implementation of shortest feasible path [PR96374]
    
    The analyzer builds an exploded graph of (point,state) pairs and when
    it finds a problem, records a diagnostic at the relevant exploded node.
    Once it has finished exploring the graph, the analyzer needs to generate
    the shortest feasible path through the graph to each diagnostic's node.
    This is used:
    - for rejecting diagnostics that are infeasible (due to impossible sets
      of constraints),
    - for use in determining which diagnostic to use in each deduplication
      set (the one with the shortest path), and
    - for building checker_paths for the "winning" diagnostics, giving a
      list of events
    
    Prior to this patch the analyzer simply found the shortest path to the
    node, and then checked it for feasibility, which could lead to falsely
    rejecting diagnostics: "the shortest path, if feasible" is not the same
    as "the shortest feasible path" (PR analyzer/96374).
    An example is PR analyzer/93355, where this issue causes the analyzer
    to fail to emit a leak warning for a missing fclose on an error-handling
    path in intl/localealias.c.
    
    This patch implements a new algorithm for finding the shortest feasible
    path to an exploded node: instead of simply finding the shortest path,
    the new algorithm uses a worklist to iteratively build a tree of path
    prefixes, which are feasible paths by construction, until a path to the
    target node is found.  The worklist is prioritized, so that the first
    feasible path discovered is the shortest possible feasible path.  The
    algorithm continues trying paths until the target node is reached or a
    limit is exceeded, in which case the diagnostic is treated as being
    infeasible (which could still be a false negative, but is much less
    likely to happen than before).  Iteratively building a tree of paths
    allows for work to be reused, and the tree can be dumped in .dot form
    (via a new -fdump-analyzer-feasibility option), making it much easier to
    debug compared to other approaches I tried.
    
    Doing so fixes the missing leak warning for PR analyzer/93355 and
    various other test cases.
    
    Testing:
    - I manually verified that the behavior is determistic using 50 builds
      of pr93355-localealias.c.  All dumps were identical.
    - I manually verified that it still builds with --disable-analyzer.
    - Lightly tested with valgrind; no additional issues.
    - Lightly performance tested, showing a slight speed regression to the
      analyzer relative to before the patch, but correctness for this issue
      is more important than the slight performance hit for the analyzer.
    
    gcc/ChangeLog:
            PR analyzer/96374
            * Makefile.in (ANALYZER_OBJS): Add analyzer/feasible-graph.o and
            analyzer/trimmed-graph.o.
            * doc/analyzer.texi (Analyzer Paths): Rewrite description of
            feasibility checking to reflect new implementation.
            * doc/invoke.texi (-fdump-analyzer-feasibility): Document new
            option.
            * shortest-paths.h (shortest_paths::get_shortest_distance): New.
    
    gcc/analyzer/ChangeLog:
            PR analyzer/96374
            * analyzer.opt (-param=analyzer-max-infeasible-edges=): New param.
            (fdump-analyzer-feasibility): New flag.
            * diagnostic-manager.cc: Include "analyzer/trimmed-graph.h" and
            "analyzer/feasible-graph.h".
            (epath_finder::epath_finder): Convert m_sep to a pointer and
            only create it if !flag_analyzer_feasibility.
            (epath_finder::~epath_finder): New.
            (epath_finder::m_sep): Convert to a pointer.
            (epath_finder::get_best_epath): Add param "diag_idx" and use it
            when logging.  Rather than finding the shortest path and then
            checking feasibility, instead use explore_feasible_paths unless
            !flag_analyzer_feasibility, in which case simply use the shortest
            path, and note if it is infeasible.  Update for m_sep becoming a
            pointer.
            (class feasible_worklist): New.
            (epath_finder::explore_feasible_paths): New.
            (epath_finder::process_worklist_item): New.
            (class dump_eg_with_shortest_path): New.
            (epath_finder::dump_trimmed_graph): New.
            (epath_finder::dump_feasible_graph): New.
            (saved_diagnostic::saved_diagnostic): Add "idx" param, using it
            on new field m_idx.
            (saved_diagnostic::to_json): Dump m_idx.
            (saved_diagnostic::calc_best_epath): Pass m_idx to get_best_epath.
            Remove assertion that m_problem was set when m_best_epath is NULL.
            (diagnostic_manager::add_diagnostic): Pass an index when created
            saved_diagnostic instances.
            * diagnostic-manager.h (saved_diagnostic::saved_diagnostic): Add
            "idx" param.
            (saved_diagnostic::get_index): New accessor.
            (saved_diagnostic::m_idx): New field.
            * engine.cc (exploded_node::dump_dot): Call args.dump_extra_info.
            Move code to...
            (exploded_node::dump_processed_stmts): ...this new function and...
            (exploded_node::dump_saved_diagnostics): ...this new function.
            Add index of each diagnostic.
            (exploded_edge::dump_dot):  Move bulk of code to...
            (exploded_edge::dump_dot_label): ...this new function.
            * exploded-graph.h (eg_traits::dump_args_t::dump_extra_info): New
            vfunc.
            (exploded_node::dump_processed_stmts): New decl.
            (exploded_node::dump_saved_diagnostics): New decl.
            (exploded_edge::dump_dot_label): New decl.
            * feasible-graph.cc: New file.
            * feasible-graph.h: New file.
            * trimmed-graph.cc: New file.
            * trimmed-graph.h: New file.
    
    gcc/testsuite/ChangeLog:
            PR analyzer/96374
            * gcc.dg/analyzer/dot-output.c: Add -fdump-analyzer-feasibility
            to options.
            * gcc.dg/analyzer/feasibility-1.c (test_6): Remove xfail.
            (test_7): New.
            * gcc.dg/analyzer/pr93355-localealias-feasibility-2.c: Remove xfail.
            * gcc.dg/analyzer/pr93355-localealias-feasibility-3.c: Remove xfails.
            * gcc.dg/analyzer/pr93355-localealias-feasibility.c: Remove
            -fno-analyzer-feasibility from options.
            * gcc.dg/analyzer/pr93355-localealias.c: Likewise.
            * gcc.dg/analyzer/unknown-fns-4.c: Remove xfail.

Comment 11 David Malcolm 2021-03-11 23:18:47 UTC

The above patch fixes the feasibility issue in (b) above, and the analyzer now successfully emits a diagnostic for the leak.

The only remaining issue is (a) (see comment #9 above).

Comment 12 David Malcolm 2021-03-12 20:10:50 UTC

(In reply to David Malcolm from comment #11)
> The only remaining issue is (a) (see comment #9 above).

Updating Summary accordingly, and adding to call summaries tracker