This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

From: Jeff Law <law at redhat dot com>
To: Martin Sebor <msebor at gmail dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Fri, 9 Aug 2019 11:07:25 -0600
Subject: Re: [PATCH] integrate sprintf pass into strlen (PR 83431)
References: <fda47188-e9f7-83b4-32a0-49f85daf28b2@gmail.com> <0cd4881c-80db-6417-8c53-8a30b7e7e9e2@gmail.com> <08f1770d-a1ac-3b30-1715-8da6167c5ae0@redhat.com> <935b52f4-aa90-96a2-d051-6f9c2076dfad@gmail.com>
On 7/10/19 5:54 PM, Martin Sebor wrote:
>> So if I'm reading things correctly, it appears gimple-ssa-sprintf.c is
>> no longer a distinct pass.  Instead it co-exists with the strlen pass.
>> Right?
> 
> Yes.  strlen just calls into sprintf to handle the calls.
OK.  Just wanted to make sure I understood it's structure at the highest
level.


> 
>>> diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
>>> index a0934bcaf87..b05e4050f1d 100644
>>> --- a/gcc/gimple-ssa-sprintf.c
>>> +++ b/gcc/gimple-ssa-sprintf.c
>>> @@ -683,7 +618,7 @@ fmtresult::type_max_digits (tree type, int base)
>>>     static bool
>>>   get_int_range (tree, HOST_WIDE_INT *, HOST_WIDE_INT *, bool,
>>> HOST_WIDE_INT,
>>> -           class vr_values *vr_values);
>>> +           const class vr_values *vr_values);
>> FWIW, I think this is something *I* could do a lot better at.
>> Specifically I think we're not supposed to be writing the "class" here
>> and I'm not as good as I should be with marking things const.  Thanks
>> for cleaning up my lack of consts.
> 
> I think you did the best you could given the APIs you had to work
> with There's still plenty of room to improve const-correctness but
> it involves changing other APIs outsid strlen/sprintf.
I wasn't necessarily referring to any of the strlen/sprintf code when I
made that comment.  I was thinking more about DOM and jump threading
where I think I've got extraneous "class" all over the place.  And I
can't recall ever auditing for const-correctness.  Both are probably
worth fixing.


> 
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index 9a5b0cd554a..637e228f988 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -42,7 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>>>     NEXT_PASS (pass_build_cfg);
>>>     NEXT_PASS (pass_warn_function_return);
>>>     NEXT_PASS (pass_expand_omp);
>>> -  NEXT_PASS (pass_sprintf_length, false);
>>> +  NEXT_PASS (pass_strlen, false);
>> So this is something we discussed a bit on the phone.  This is very
>> early in the pipeline -- before we've gone into SSA form.
>>
>> I'm a bit concerned that we're running strlen that early without some
>> kind of auditing of whether or not the strlen pass can safely run that
>> early.  Similarly have we done any review for issues that might arise
>> from running strlen more than once?  I guess in some small way
>> encapsulating the state into a class like my unsubmitted patch does
>> would help there.
> 
> The strlen optimization machinery only runs once.  The code avoids
> running it when the pass is invoked early and only calls into sprintf
> to do format checking.
OK.  Thanks for clarifying.  That's probably why we have the someone
unusual gating tests.


> 
>>
>> More generally I think we concluded that the placement of sprintf this
>> early was driven by the early placement of walloca.  I don't want to
>> open a huge can of worms here, but do we really need to run this early
>> in the pipeline?
> 
> We decided to run it early when optimization is disabled because
> there's a good amount of code that can be checked even without
> ranges and string lengths (especially at the conservative level
> 2 setting when we consider the largest integers and array sizes
> instead of values or string lengths).
> 
> For example, this is diagnosed for the potential buffer overflow
> at -Wformat-overflow=2 even without optimization:
> 
>   char a[8], s[4];
> 
>   void f (int i)
>   {
>     __builtin_sprintf (a, "%s = %i", s, i);
>   }
> 
>   warning: ‘%i’ directive writing between 1 and 11 bytes into a region
> of size between 2 and 5 [-Wformat-overflow=]
That does sound familiar.  But ISTM in a non-optimized case we could
still just run the late one and get the warning.  It would seem the
problem with that is the late pass is inside the pass_all_optimizations
in passes.def.

We'd probably have to close the pass_all_optimizations, do the sprintf
checking, then open a new pass_all_optimizations_something for the rest
of the pipeline currently under pass_all_optimizations.

Seems out of scope for now, but worth remembering.


> 
>> Nit: Use NULL rather than null.  I think this happens in more than one
>> place in your patch.  Similarly I think we generally use NUL rather than
>> nul when referring to a single character.
> The term is a "null pointer."  NULL is a C macro that has in C++ 11
> been superseded by nullptr.  I don't mind using NUL character but
> I also don't think it matters.  No one will be confused about what
> either means.
It's more about existing conventions and consistency in the codebase.
We've used NULL to refer to "null pointer" for decades.

> It's easy enough to add here.  But I know I've introduced other
> algorithms that recurse on SSA_NAME_DEF_STMT, and I'm quite sure
> others predate those.  To make a difference I think we need to
> review at least the one most likely to be exposed to this problem
> and introduce the same limit there.  I could probably fix the ones
> I wrote reasonably quickly, but making the change to the others
> would be much more of a project.  I looked to see how pervasive
> this might be and here is just a small sampling of things that
> jumped out at me in a quick grep search:
> 
>  *  compute_builtin_object_size (for _FORTIFY_SOURCE)
>  *  compute_objsize (for -Wstringop-overflow)
>  *  get_range_strlen
>  *  maybe_fold_{and,or}_comparisons in gimple-fold.c
>  *  -Warray-bounds (computing an offset into an object)
>  *  -Wrestrict (computing an offset into an object)
>  *  -Wreturn-local-addr (is_addr_local)
>  *  -Wuninitialized (collect_phi_def_edges)
> 
> Given how wide-spread this technique seems to be, if the recursion
> is in fact a problem it's just as likely (if not more) to come up
> in the folder or in BOS or some other place as it is here.  So if
> it needs fixing it seems it should be done as its own project and
> everywhere (or as close as we can get), and not as part of this
> integration.
So I think we can come back to the passes above and add limiters to them
as a distinct patchkit.  And I think we will ultimately want those limiters.

>>> @@ -3703,84 +4031,231 @@ fold_strstr_to_strncmp (tree rhs1, tree
>>> rhs2, gimple *stmt)
>>>       }
>>>   }
>>>   +/* Check the built-in call at GSI for validity and optimize it.
>>> +   Return true to let the caller advance *GSI to the statement
>>> +   in the CFG and false otherwise.  */
>>> +
>>> +static bool
>>> +check_and_optimize_call (gimple_stmt_iterator *gsi, const vr_values
>>> *rvals)
>> It was suggested that perhaps we should prefix this call name, but I
>> think the better solution might be to classify the pass and make this a
>> member function.  That would seem to naturally fall to me since I've got
>> a classification patch for this code from last year that I could easily
>> update after your patch.
> 
> Well, sure.  The whole pass can be a class (or a set of classes).
> It began as C code and then C++ started to slowly and organically
> creep in.  There are many other nice improvements we could make
> by putting C++ to better use.  One very simple one I'd like is
> moving local variable declarations to the point of their
> initialization.  Making the APIs const-correct would also improve
> readability.  But I've resisted making these changes because I
> know people are sensitive to too much churn.  If you think it's
> a good idea for me to make these changes let me know.  I'd be
> happy to do it, just separately from this integration.
I probably should have been clearer.  Let's go ahead and put in a prefix
for now, but the better long term plan is to classify the pass, address
const-correctness, initialization points, etc  Each of which should
probably happen independently.  I happen to have a patch which does the
classification, so that would fall to me naturally.


> 
>>> @@ -4119,7 +4504,10 @@ const pass_data pass_data_strlen =
>>>     "strlen", /* name */
>>>     OPTGROUP_NONE, /* optinfo_flags */
>>>     TV_TREE_STRLEN, /* tv_id */
>>> -  ( PROP_cfg | PROP_ssa ), /* properties_required */
>>> +  /* Normally the pass would require PROP_ssa but because it also
>>> +     runs early, with no optimization, to do sprintf format checking,
>>> +     it only requires PROP_cfg.  */
>>> +  PROP_cfg, /* properties_required */
>>>     0, /* properties_provided */
>>>     0, /* properties_destroyed */
>>>     0, /* todo_flags_start */
>>> @@ -4128,20 +4516,50 @@ const pass_data pass_data_strlen =
>> So the question I'd come back to is what are we capturing with the
>> instance that runs before we're in SSA form and can we reasonably catch
>> that stuff after going into SSA form?
>>
>> It may be that we went through this at the initial submission of the
>> sprintf patches.  I simply can't remember.
> 
> Please see my answer + example above.
So I'm not really comfortable dropping PROP_ssa.  Basically I can't
convince myself that doing so won't ultimately drop PROP_ssa from the
current function's properties.  Which in turn would turn off certain
sanity checkers in passes.def.

What I think this argues is that you want to register two distinct
passes.  They would differ in their "name" and "properties required"
fields.  You may also be able to clean up the gating with that change as
well.  They obviously share a ton of underlying code.






>>
>>
>>
>>> +
>>>   unsigned int
>>>   pass_strlen::execute (function *fun)
>>>   {
>>> +  strlen_optimize = do_optimize;
>>> +
>>>     gcc_assert (!strlen_to_stridx);
>>>     if (warn_stringop_overflow || warn_stringop_truncation)
>>>       strlen_to_stridx = new hash_map<tree, stridx_strlenloc> ();
>>> @@ -4151,10 +4569,17 @@ pass_strlen::execute (function *fun)
>>>       calculate_dominance_info (CDI_DOMINATORS);
>>>   +  bool use_scev = optimize > 0 && flag_printf_return_value;
>>> +  if (use_scev)
>>> +    {
>>> +      loop_optimizer_init (LOOPS_NORMAL);
>>> +      scev_initialize ();
>>> +    }
>>> +
>>>     /* String length optimization is implemented as a walk of the
>>> dominator
>>>        tree and a forward walk of statements within each block.  */
>>>     strlen_dom_walker walker (CDI_DOMINATORS);
>>> -  walker.walk (fun->cfg->x_entry_block_ptr);
>>> +  walker.walk (ENTRY_BLOCK_PTR_FOR_FN (fun));
>>>       ssa_ver_to_stridx.release ();
>>>     strinfo_pool.release ();
>>> @@ -4175,6 +4600,15 @@ pass_strlen::execute (function *fun)
>>>         strlen_to_stridx = NULL;
>>>       }
>>>   +  if (use_scev)
>>> +    {
>>> +      scev_finalize ();
>>> +      loop_optimizer_finalize ();
>>> +    }
>>> +
>>> +  /* Clean up object size info.  */
>>> +  fini_object_sizes ();
>>> +
>>>     return walker.m_cleanup_cfg ? TODO_cleanup_cfg : 0;
>>>   }
>> Is scev really that useful here?  If it is, fine, if not I'd rather not
>> pay the price to set it up.
> 
> It was introduced by Aldy to fix PR 85598.
Ah, it's just moved from the sprintf pass into strlen.  I see what
happened now.

Still have to review the updated patch, but wanted to close on some of
the issues we were already discussing.

jeff
Follow-Ups:
- Re: [PATCH] integrate sprintf pass into strlen (PR 83431)
  - From: Martin Sebor
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]