daily report on extending static analyzer project [GSoC]

Ankur Saini arsenic.secondary@gmail.com
Thu Aug 5 14:57:12 GMT 2021



> On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
> 
> [...snip...]
>> 
>> - From observation, a typical vfunc call that isn't devirtualised by
>> the compiler's front end looks something like this 
>> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
>> where "a_ptr_5(D)" is pointer that is being used to call the virtual
>> function.
>> 
>> - We can access it's region to see what is the type of the object the
>> pointer is actually pointing to.
>> 
>> - This is then used to find a call with DECL_CONTEXT of the object
>> from the all the possible targets of that polymorphic call.
> 
> [...]
> 
>> 
>> Patch file ( prototype ) : 
>> 
> 
>> +  /* Call is possibly a polymorphic call.
>> +  
>> +     In such case, use devirtisation tools to find 
>> +     possible callees of this function call.  */
>> +  
>> +  function *fun = get_current_function ();
>> +  gcall *stmt  = const_cast<gcall *> (call);
>> +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>> +  if (e->indirect_info->polymorphic)
>> +  {
>> +    void *cache_token;
>> +    bool final;
>> +    vec <cgraph_node *> targets
>> +      = possible_polymorphic_call_targets (e, &final, &cache_token, true);
>> +    if (!targets.is_empty ())
>> +      {
>> +        tree most_propbable_taget = NULL_TREE;
>> +        if(targets.length () == 1)
>> +    	    return targets[0]->decl;
>> +    
>> +        /* From the current state, check which subclass the pointer that 
>> +           is being used to this polymorphic call points to, and use to
>> +           filter out correct function call.  */
>> +        tree t_val = gimple_call_arg (call, 0);
> 
> Maybe rename to "this_expr"?
> 
> 
>> +        const svalue *sval = get_rvalue (t_val, ctxt);
> 
> and "this_sval"?

ok

> 
> ...assuming that that's what the value is.
> 
> Probably should reject the case where there are zero arguments.

Ideally it should always have one argument representing the pointer used to call the function. 

for example, if the function is called like this : -

a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a pointer to an object of a subclass.

I saw that it’s GIMPLE representation is as follows : -

OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);

> 
> 
>> +
>> +        const region *reg
>> +          = [&]()->const region *
>> +              {
>> +                switch (sval->get_kind ())
>> +                  {
>> +                    case SK_INITIAL:
>> +                      {
>> +                        const initial_svalue *initial_sval
>> +                          = sval->dyn_cast_initial_svalue ();
>> +                        return initial_sval->get_region ();
>> +                      }
>> +                      break;
>> +                    case SK_REGION:
>> +                      {
>> +                        const region_svalue *region_sval 
>> +                          = sval->dyn_cast_region_svalue ();
>> +                        return region_sval->get_pointee ();
>> +                      }
>> +                      break;
>> +
>> +                    default:
>> +                      return NULL;
>> +                  }
>> +              } ();
> 
> I think the above should probably be a subroutine.
> 
> That said, it's not clear to me what it's doing, or that this is correct.


Sorry, I think I should have explained it earlier.

Let's take an example code snippet :- 

Derived d;
Base *base_ptr;
base_ptr = &d;
base_ptr->foo();	// where foo() is a virtual function

This genertes the following GIMPLE dump :- 

Derived::Derived (&d);
base_ptr_6 = &d.D.3779;
_1 = base_ptr_6->_vptr.Base;
_2 = _1 + 8;
_3 = *_2;
OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);

Here instead of trying to extract virtual pointer from the call and see which subclass it belongs, I found it simpler to extract the actual pointer which is used to call the function itself (which from observation, is always the first parameter of the call) and used the region model at that point to figure out what is the type of the object it actually points to ultimately get the actual subclass who's function is being called here. :)

Now let me try to explain how I actually executed it ( A lot of assumptions here are based on observation, so please correct me wherever you think I made a false interpretation or forgot about a certain special case ) :

- once it is confirmed that the call that we are dealing with is a polymorphic call ( via the cgraph edge representing the call ), I used the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined in ipa-devirt.c ), to get the possible callee of that call. 

  function *fun = get_current_function ();
  gcall *stmt  = const_cast<gcall *> (call);
  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
  if (e->indirect_info->polymorphic)
  {
    void *cache_token;
    bool final;
    vec <cgraph_node *> targets
      = possible_polymorphic_call_targets (e, &final, &cache_token, true);

- Now if the list contains more than one targets, I will make use of the current enode's region model to get more info about the pointer which was used to call the function .

    	/* here I extract the pointer (which was used to call the function), which from observation, is always the zeroth argument of the call.  */
        tree t_val = gimple_call_arg (call, 0);
        const svalue *sval = get_rvalue (t_val, ctxt);

- In all the examples I used, the pointer is represented as region_svalue or as initial_svalue (I think, initial_svalue is the case where the pointer is taken as a parameter of the current function and analyzer is analysing top-level call to this function )

Here are some examples of the following, Where I used __analyzer_describe () to show the same 
 . (https://godbolt.org/z/Mqs8oM6ff)
 . (https://godbolt.org/z/z4sfTM3f5))

 	/* here I extract the region that the pointer is pointing to, and as both of them returns a (const region *), I used a lambda to get it ( If you want, I can turn this into a separate function to make it more readable )  */

        const region *reg
          = [&]()->const region *
              {
                switch (sval->get_kind ())
                  {
                    case SK_INITIAL:
                      {
                        const initial_svalue *initial_sval
                          = sval->dyn_cast_initial_svalue ();
                        return initial_sval->get_region ();
                      }
                      break;
                    case SK_REGION:
                      {
                        const region_svalue *region_sval 
                          = sval->dyn_cast_region_svalue ();
                        return region_sval->get_pointee ();
                      }
                      break;

                    default:
                      return NULL;
                  }
              } ();

        gcc_assert (reg);

        /* Now that I have the region, I tried to get the type of the object it is holding and put it in ‘known_possible_subclass_type’.  */

        tree known_possible_subclass_type;
        known_possible_subclass_type = reg->get_type ();
        if (reg->get_kind () == RK_FIELD)
          {
             const field_region* field_reg = reg->dyn_cast_field_region ();
             known_possible_subclass_type 
               = DECL_CONTEXT (field_reg->get_field ());
          }

/* After that I iterated over the entire array of possible calls to find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same as that of the type of the object that the pointer is actually pointing to.  */

        for (cgraph_node *x : targets)
          {
            if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
              most_propbable_taget = x->decl;
          }
        return most_propbable_taget;
      }
   }

I tested it on all of the test programs I created and till now in all of the cases, the analyzer is correctly determining the call. I am currently in the process of creating more tests ( including multiple types of inheritances ) to see how successful is this implementation .

> 
> I'm guessing that you need to see if
>  *((void **)this)
> is a vtable pointer (or something like that), and, if so, which class
> it is for.
> 
> Is there a way of getting the vtable pointer as an svalue?
> 
>> +        gcc_assert (reg);
>> +
>> +        tree known_possible_subclass_type;
>> +        known_possible_subclass_type = reg->get_type ();
>> +        if (reg->get_kind () == RK_FIELD)
>> +          {
>> +             const field_region* field_reg = reg->dyn_cast_field_region ();
>> +             known_possible_subclass_type 
>> +               = DECL_CONTEXT (field_reg->get_field ());
>> +          }
>> +
>> +        for (cgraph_node *x : targets)
>> +          {
>> +            if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>> +              most_propbable_taget = x->decl;
>> +          }
>> +        return most_propbable_taget;
>> +      }
>> +   }
>> +
>>   return NULL_TREE;
>> }
> 
> Dave
> 
> 

Thanks 
- Ankur


More information about the Gcc mailing list