[RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

Richard Biener richard.guenther@gmail.com
Wed Jun 5 07:26:02 GMT 2024


On Tue, Jun 4, 2024 at 10:31 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
>
>
> > On Jun 4, 2024, at 03:43, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Jun 3, 2024 at 4:48 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >>
> >> On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote:
> >>> On Fri, May 31, 2024 at 11:23 PM Qing Zhao <qing.zhao@oracle.com>
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On May 23, 2024, at 07:46, Richard Biener
> >>>>> <richard.guenther@gmail.com> wrote:
> >>>>>
> >>>>> On Wed, May 22, 2024 at 8:53 PM Qing Zhao <qing.zhao@oracle.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On May 22, 2024, at 03:38, Richard Biener
> >>>>>>> <richard.guenther@gmail.com> wrote:
> >>>>>>>
> >>>>>>> On Tue, May 21, 2024 at 11:36 PM David Malcolm
> >>>>>>> <dmalcolm@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
> >>>>>>>>> Thanks for the comments and suggestions.
> >>>>>>>>>
> >>>>>>>>>> On May 15, 2024, at 10:00, David Malcolm
> >>>>>>>>>> <dmalcolm@redhat.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener
> >>>>>>>>>> wrote:
> >>>>>>>>>>> On Mon, 13 May 2024, Qing Zhao wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> -Warray-bounds is an important option to enable
> >>>>>>>>>>>> linux kernal to
> >>>>>>>>>>>> keep
> >>>>>>>>>>>> the array out-of-bound errors out of the source
> >>>>>>>>>>>> tree.
> >>>>>>>>>>>>
> >>>>>>>>>>>> However, due to the false positive warnings
> >>>>>>>>>>>> reported in
> >>>>>>>>>>>> PR109071
> >>>>>>>>>>>> (-Warray-bounds false positive warnings due to code
> >>>>>>>>>>>> duplication
> >>>>>>>>>>>> from
> >>>>>>>>>>>> jump threading), -Warray-bounds=1 cannot be added
> >>>>>>>>>>>> on by
> >>>>>>>>>>>> default.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Although it's impossible to elinimate all the false
> >>>>>>>>>>>> positive
> >>>>>>>>>>>> warnings
> >>>>>>>>>>>> from -Warray-bounds=1 (See PR104355 Misleading -
> >>>>>>>>>>>> Warray-bounds
> >>>>>>>>>>>> documentation says "always out of bounds"), we
> >>>>>>>>>>>> should minimize
> >>>>>>>>>>>> the
> >>>>>>>>>>>> false positive warnings in -Warray-bounds=1.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The root reason for the false positive warnings
> >>>>>>>>>>>> reported in
> >>>>>>>>>>>> PR109071 is:
> >>>>>>>>>>>>
> >>>>>>>>>>>> When the thread jump optimization tries to reduce
> >>>>>>>>>>>> the # of
> >>>>>>>>>>>> branches
> >>>>>>>>>>>> inside the routine, sometimes it needs to duplicate
> >>>>>>>>>>>> the code
> >>>>>>>>>>>> and
> >>>>>>>>>>>> split into two conditional pathes. for example:
> >>>>>>>>>>>>
> >>>>>>>>>>>> The original code:
> >>>>>>>>>>>>
> >>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
> >>>>>>>>>>>> index)
> >>>>>>>>>>>> {
> >>>>>>>>>>>> if (index >= 4)
> >>>>>>>>>>>>  warn ();
> >>>>>>>>>>>> *ptr = 0;
> >>>>>>>>>>>> *val = sg->vals[index];
> >>>>>>>>>>>> if (index >= 4)
> >>>>>>>>>>>>  warn ();
> >>>>>>>>>>>> *ptr = *val;
> >>>>>>>>>>>>
> >>>>>>>>>>>> return;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> With the thread jump, the above becomes:
> >>>>>>>>>>>>
> >>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
> >>>>>>>>>>>> index)
> >>>>>>>>>>>> {
> >>>>>>>>>>>> if (index >= 4)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    warn ();
> >>>>>>>>>>>>    *ptr = 0;         // Code duplications since
> >>>>>>>>>>>> "warn" does
> >>>>>>>>>>>> return;
> >>>>>>>>>>>>    *val = sg->vals[index];   // same this line.
> >>>>>>>>>>>>                              // In this path,
> >>>>>>>>>>>> since it's
> >>>>>>>>>>>> under
> >>>>>>>>>>>> the condition
> >>>>>>>>>>>>                              // "index >= 4", the
> >>>>>>>>>>>> compiler
> >>>>>>>>>>>> knows
> >>>>>>>>>>>> the value
> >>>>>>>>>>>>                              // of "index" is
> >>>>>>>>>>>> larger then 4,
> >>>>>>>>>>>> therefore the
> >>>>>>>>>>>>                              // out-of-bound
> >>>>>>>>>>>> warning.
> >>>>>>>>>>>>    warn ();
> >>>>>>>>>>>>  }
> >>>>>>>>>>>> else
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    *ptr = 0;
> >>>>>>>>>>>>    *val = sg->vals[index];
> >>>>>>>>>>>>  }
> >>>>>>>>>>>> *ptr = *val;
> >>>>>>>>>>>> return;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> We can see, after the thread jump optimization, the
> >>>>>>>>>>>> # of
> >>>>>>>>>>>> branches
> >>>>>>>>>>>> inside
> >>>>>>>>>>>> the routine "sparx5_set" is reduced from 2 to 1,
> >>>>>>>>>>>> however,  due
> >>>>>>>>>>>> to
> >>>>>>>>>>>> the
> >>>>>>>>>>>> code duplication (which is needed for the
> >>>>>>>>>>>> correctness of the
> >>>>>>>>>>>> code),
> >>>>>>>>>>>> we
> >>>>>>>>>>>> got a false positive out-of-bound warning.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In order to eliminate such false positive out-of-
> >>>>>>>>>>>> bound warning,
> >>>>>>>>>>>>
> >>>>>>>>>>>> A. Add one more flag for GIMPLE: is_splitted.
> >>>>>>>>>>>> B. During the thread jump optimization, when the
> >>>>>>>>>>>> basic blocks
> >>>>>>>>>>>> are
> >>>>>>>>>>>> duplicated, mark all the STMTs inside the original
> >>>>>>>>>>>> and
> >>>>>>>>>>>> duplicated
> >>>>>>>>>>>> basic blocks as "is_splitted";
> >>>>>>>>>>>> C. Inside the array bound checker, add the
> >>>>>>>>>>>> following new
> >>>>>>>>>>>> heuristic:
> >>>>>>>>>>>>
> >>>>>>>>>>>> If
> >>>>>>>>>>>> 1. the stmt is duplicated and splitted into two
> >>>>>>>>>>>> conditional
> >>>>>>>>>>>> paths;
> >>>>>>>>>>>> +  2. the warning level < 2;
> >>>>>>>>>>>> +  3. the current block is not dominating the exit
> >>>>>>>>>>>> block
> >>>>>>>>>>>> Then not report the warning.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The false positive warnings are moved from -Warray-
> >>>>>>>>>>>> bounds=1 to
> >>>>>>>>>>>> -Warray-bounds=2 now.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Bootstrapped and regression tested on both x86 and
> >>>>>>>>>>>> aarch64.
> >>>>>>>>>>>> adjusted
> >>>>>>>>>>>> -Warray-bounds-61.c due to the false positive
> >>>>>>>>>>>> warnings.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Let me know if you have any comments and
> >>>>>>>>>>>> suggestions.
> >>>>>>>>>>>
> >>>>>>>>>>> At the last Cauldron I talked with David Malcolm
> >>>>>>>>>>> about these kind
> >>>>>>>>>>> of
> >>>>>>>>>>> issues and thought of instead of suppressing
> >>>>>>>>>>> diagnostics to
> >>>>>>>>>>> record
> >>>>>>>>>>> how a block was duplicated.  For jump threading my
> >>>>>>>>>>> idea was to
> >>>>>>>>>>> record
> >>>>>>>>>>> the condition that was proved true when entering the
> >>>>>>>>>>> path and do
> >>>>>>>>>>> this
> >>>>>>>>>>> by recording the corresponding locations
> >>>>>>>>>
> >>>>>>>>> Is only recording the location for the TRUE path  enough?
> >>>>>>>>> We might need to record the corresponding locations for
> >>>>>>>>> both TRUE and
> >>>>>>>>> FALSE paths since the VRP might be more accurate on both
> >>>>>>>>> paths.
> >>>>>>>>> Is only recording the location is enough?
> >>>>>>>>> Do we need to record the pointer to the original
> >>>>>>>>> condition stmt?
> >>>>>>>>
> >>>>>>>> Just to be clear: I don't plan to work on this myself (I
> >>>>>>>> have far too
> >>>>>>>> much already to work on...); I'm assuming Richard Biener is
> >>>>>>>> hoping/planning to implement this himself.
> >>>>>>>
> >>>>>>> While I think some of this might be an improvement to those
> >>>>>>> vast
> >>>>>>> number of "false positive" diagnostics we have from (too)
> >>>>>>> late diagnostic
> >>>>>>> passes I do not have the cycles to work on this.
> >>>>>>
> >>>>>> I can study a little bit more on how to improve the diagnostics
> >>>>>> for PR 109071 first.
> >>>>>>
> >>>>>> FYI, I just updated PR109071’s description as: Need more
> >>>>>> context for -Warray-bounds warnings due to code duplication
> >>>>>> from jump threading.
> >>>>>>
> >>>>>> As we already agreed, this is NOT a false positive. It caught a
> >>>>>> real bug in linux kernel that need to be patched in the kernel
> >>>>>> source. (See
> >>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)
> >>>>>>
> >>>>>> In order to add more context to the diagnostics to help the end
> >>>>>> user locate the bug accurately in their source code, compiler
> >>>>>> needs:
> >>>>>>
> >>>>>> 1. During jump threading phase, record the following
> >>>>>> information for each duplicated STMT:
> >>>>>>       A. A pointer to the COND stmt;
> >>>>>>       B. True/false for such COND
> >>>>>> 2. During array out-of-bound checking phase, whenever locate an
> >>>>>> out-of-bound case,
> >>>>>>       A. Use a rich_location for the diagnostic;
> >>>>>>       B. Create an instance of diagnositic_path, add events to
> >>>>>> this diagnostic_path based on the COND stmt, True/False info
> >>>>>> recorded in the STMT;
> >>>>>>       C. Call richloc.set_path() to associate the path with
> >>>>>> the rich_location;
> >>>>>>       D. Then issue warning with this rich_location instead of
> >>>>>> the current regular location.
> >>>>>>
> >>>>>> Any comment and suggestion to the above?
> >>>>>
> >>>>> I was originally thinking of using location_adhoc_data to store
> >>>>> 1.A
> >>>>> and 1.B as a common bit to associate to each
> >>>>> copied stmt.  IIRC location_adhoc_data->data is the stmts
> >>>>> associated
> >>>>> lexical block so we'd need to stuff another
> >>>>> 'data' field into this struct, like a "copy history" where we
> >>>>> could
> >>>>> then even record copied-of-copy-of-X.
> >>>>> locataion_adhoc_data seems imperfectly packed right now, with a
> >>>>> 32bit
> >>>>> hole before 'data' and 32bit unused
> >>>>> at its end, so we might get away without memory use by re-
> >>>>> ordering
> >>>>> discriminator before 'data' ...
> >>>>
> >>>> Like this?
> >>>>
> >>>> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> >>>> index e6e2b0897572..ee344f91333b 100644
> >>>> --- a/libcpp/include/line-map.h
> >>>> +++ b/libcpp/include/line-map.h
> >>>> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro {
> >>>> struct GTY(()) location_adhoc_data {
> >>>>   location_t locus;
> >>>>   source_range src_range;
> >>>> -  void * GTY((skip)) data;
> >>>>   unsigned discriminator;
> >>>> +  void * GTY((skip)) data;
> >>>> +  void * GTY((skip)) copy_history;
> >>>> };
> >>>>   struct htab;
> >>>
> >>> Yes.
> >>>
> >>>> How about the copy_history? Do we need a new data structure (like
> >>>> the following, or other suggestion) for this? Where should I add
> >>>> this new data structure?
> >>>
> >>> As it needs to be managed by libcpp it should be in this very same
> >>> file.
> >>>
> >>>> struct copy_history {
> >>>>  location_t condition;
> >>>>  Bool is_true_path;
> >>>> }
> >>>
> >>> I think we want a pointer to the previous copy-of state as well in
> >>> case a stmt
> >>> was copied twice.  We'll see whether a single (condition) location
> >>> plus edge flag
> >>> is sufficient.  I'd say we should plan for an enum to indicate the
> >>> duplication
> >>> reason at least (jump threading, unswitching, unrolling come to my
> >>> mind).  For
> >>> jump threading being able to say "when <condition> is true/false" is
> >>> probably
> >>> good enough, though it might not always be easy to identify a single
> >>> condition
> >>> here given a threading path starts at an incoming edge to a CFG merge
> >>> and
> >>> will usually end with the outgoing edge of a condition that we can
> >>> statically
> >>> evaluate.  The condition controlling the path entry might not be
> >>> determined
> >>> fully by a single condition location.
> >>>
> >>> Possibly building a full "diagnostic path" object at threading time
> >>> might be
> >>> the only way to record all the facts, but that's also going to be
> >>> more
> >>> expensive.
> >>
> >> Note that a diagnostic_path represents a path through some kind of
> >> graph, whereas it sounds like you want to be storing the *nodes* in the
> >> graph, and later generating the diagnostic_path from that graph when we
> >> need it (which is trivial if the graph is actually just a tree: just
> >> follow the parent links backwards, then reverse it).
> >
> > I think we are mixing two things - one is that a single transform like jump
> > threading produces a stmt copy and when we emit a diagnostic on that
> > copied statement we want to tell the user the condition under which the
> > copy is executed.  That "condition" can be actually a sequence of
> > conditionals.  I wanted to point out that a diagnostic_path instance could
> > be used to describe such complex condition.
> >
> > But then the other thing I wanted to address with the link to a previous
> > copy_history - that's when a statement gets copied twice, for example
> > by two distinct jump threading optimizations.  Like when dumping
> > the inlining decisions for diagnostics we could dump the logical "and"
> > of the conditions of the two threadings.  Since we have a single
> > location per GIMPLE stmt we'd have to keep a "stack" of copy events
> > associated with it.  That's the linked list (I think a linked list should
> > work fine here).
> Yes, the linked list to keep the “stack” of copy events should be good enough
>  to form the sequence of conditionals event for the diagnostic_path instance.
> >
> > I realize things may get a bit "fat", but eventually we are not duplicating
> > statements that much.  I do hope we can share for example a big
> > diagnostic_path when we duplicate a basic block during threading
> > and use one instance for all stmts in such block copy (IIRC we never
> > release locations or their ad-hoc data, we just make sure to never
> > use locations with ad-hoc data pointing to BLOCKs that we released,
> > but the linemap data will still have pointers in "dead" location entries,
> > right?)
> Are you still suggesting to add two artificial stmts in the beginning and the
> end of the duplicated block to carry the copy history information for all the
> stmts in the block to save space?
>
> Compared with the approach to carry such information to each duplicated stmts (which I preferred),
> The major concerns with the approach are:
>    1. Optimizations might move stmts out of these two artificial stmts, therefore we need add
>         Some memory barrier on these two artificial stmts to prevent such movements.
>        This might prevent good optimization from happening and result in runtime performance loss;
>    2. Every time when query whether the stmt is copied and get  its copy history, we have to search backward or
>         Forward through the stmt chain to get to the artificial stmts that carry the copy history, compilation time will
>        be slower.
>
> I still think that carrying the copy history info to each duplicated stmts might be better. We can limit the length of
> the history to control the space, what do you think?

Copying to each stmt is definitely easier so I think it's better to
start with that and only resort
to other things when this fails.

Note we could, instead of putting a pointer to data into the ad-hoc
info, put in an index
and have the actual data be maintained elsewhere.  That data could
even only keep
the "last" 1000 contexts and simply discard "older" ones.  Before IPA
info can still be
transfered between functions, but after IPA where most of the
threadings happen the
extra info could be discarded once we get a function to the last pass
emitting diagnostics
we want to amend.  Doing an index lookup makes it possible to not
worry about "stale"
entries.

Richard.

>
> Qing
>
> >
> > Richard.
> >
> >>
> >> Dave
> >>
> >>> I can think of having a -fdiagnostic-try-to-explain-harder option to
> >>> clarify confusing
> >>> diagnostics people could use on-demand?
> >>>
> >>> Richard.
> >>>
> >>>> Qing
> >>>>>
> >>>>> Richard.
> >>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Qing
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Richard.
> >>>>>>>
> >>>>>>>> But feel free to ask me any questions about the
> >>>>>>>> diagnostic_path
> >>>>>>>> machinery within the diagnostics subsystem.
> >>>>>>>>
> >>>>>>>> [...snip...]
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Dave
>
>


More information about the Gcc-patches mailing list