[PATCH] improve validation of attribute arguments (PR c/78666)

Martin Sebor msebor@gmail.com
Mon Aug 24 16:45:26 GMT 2020

On 8/24/20 4:59 AM, Aldy Hernandez wrote:
> On 8/21/20 1:37 AM, Martin Sebor wrote:
>> On 8/20/20 3:00 PM, Aldy Hernandez wrote:
>>> Regardless, here are some random comments.
>> Thanks for the careful review!
>>>> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
>>>> index 37214831538..bc4f409e346 100644
>>>> --- a/gcc/c-family/c-attribs.c
>>>> +++ b/gcc/c-family/c-attribs.c
>>>> @@ -720,6 +725,124 @@ positional_argument (const_tree fntype, 
>>>> const_tree atname, tree pos,
>>>>    return pos;
>>>>  }
>>>> +/* Given a pair of NODEs for arbitrary DECLs or TYPEs, validate one or
>>>> +   two integral or string attribute arguments NEWARGS to be applied to
>>>> +   NODE[0] for the absence of conflicts with the same attribute 
>>>> arguments
>>>> +   already applied to NODE[1]. Issue a warning for conflicts and 
>>>> return
>>>> +   false.  Otherwise, when no conflicts are found, return true.  */
>>>> +
>>>> +static bool
>>>> +validate_attr_args (tree node[2], tree name, tree newargs[2])
>>> I think you're doing too much work in one function.  Also, I *really* 
>>> dislike sending pairs of objects in arrays, especially when they're 
>>> called something so abstract as "node" and "newargs".
>>> Would it be possible to make this function only validate one single 
>>> argument and call it twice?  Or do we gain something by having it do 
>>> two things at once?
>> I agree about the name "node."  The argument comes from the attribute
>> handlers: they all take something called a node as their first argument.
>> It's an array of three elements:
>>    [0] the current declaration or type
>>    [1] the previous declaration or type or null
>>    [2] the current declaration if [0] is a type
> Ah, the rest of the functions are taking a node pointer.  Your patch 
> threw me off because you use a node[2] instead of a node pointer like 
> the rest of the functions.  Perhaps you should keep to the current style 
> and pass a node *.

It takes tree node[3] and the -Warray-parameter option (being
reviewed) uses the bound to check for out-of-bounds accesses, both
callers and the callee itself.  (C, but not C++, has special syntax
for this: tree node[static 3].)

>> validate_attr_args() is called with the same node as the handlers
>> and uses both node[0] and node[1] to recursively validate the first
>> one against itself and then against the second.  It could be changed
>> to take two arguments instead of an array (the new "node" and
>> the original "node," perhaps even under some better name).  That
>> would make it different from every handler but maybe that wouldn't
>> be a problem.
>> The newargs argument is also an array, with the second element
>> being optional.  Both elements are used and validated against
>> the attribute arguments on the same declaration first and again
>> on the previous one.  The array could be split up into two
>> distinct arguments, say newarg1 and newarg2, or passed in as
>> std::pair<tree, tree>.  I'm not sure I see much of a difference
>> between the approaches.
> It looks like node[] carries all the information for the current 
> attribute and arguments, as well the same information for the previous 
> attribute.  Could your validate function just take:
> validate_attr_args (tree *node, tree name)
> That way you can save passing a pair of arguments, plus you can save 
> accumulating said arguments in the handle_* functions.
> Or is there something I'm missing here that makes this unfeasible?

If the function didn't the newargs array it would have to extract
the argument(s) from node, duplicating the work already done in
the callers.  I.e., figuring out how many arguments the attribute
expects (one or two, depending on the specific attribute), and for
handle_alloc_size_attribute, calling positional_argument (or at
a minimum default_conversion) to convert it to the expected value.
So it's feasible but doesn't seem like a good design.

>>   /* Extract the same attribute from the previous declaration or 
>> type.  */
>>   tree prevattr = NULL_TREE;
>>   if (DECL_P (node[1]))
>>     {
>>       prevattr = DECL_ATTRIBUTES (node[1]);
>>       if (!prevattr)
>>     {
>>       tree type = TREE_TYPE (node[1]);
>>       prevattr = TYPE_ATTRIBUTES (type);
>>     }
>>     }
>>   else if (TYPE_P (node[1]))
>>     prevattr = TYPE_ATTRIBUTES (node[1]);
> If all this section does is extract the attribute from a decl, it would 
> look cleaner if you abstracted out this functionality into its own 
> function.  I'm a big fan of one function to do one thing.
>>   const char* const namestr = IDENTIFIER_POINTER (name);
>>   prevattr = lookup_attribute (namestr, prevattr);
>>   if (!prevattr)
>>     return true;
> Perhaps a better name would be attribute_name_str?

Thanks for the suggestion but I think NAMESTR is good enough: it
should be clear enough from the function argument NAME that it
refers to the string representation of the NAME.  There also is
already a pre-existing use of NAMESTR elsewhere and so a precedent
for this choice.

>>   /* Extract one or both attribute arguments.  */
>>   tree prevargs[2];
>>   prevargs[0] = TREE_VALUE (TREE_VALUE (prevattr));
>>   prevargs[1] = TREE_CHAIN (TREE_VALUE (prevattr));
>>   if (prevargs[1])
>>     prevargs[1] = TREE_VALUE (prevargs[1]);
> Does this only work with 2 arguments?  What if the attribute takes 3 or 
> 4 arguments?  We should make this generic enough to handle any number of 
> arguments.

It only works with two arguments because one- and two-argument
attribute handlers are the only callers it's suitable for today.
If/when there is a use case for validating more than two arguments
in a general way the function will need to be extended.

At the moment, there are only a handful of attributes that take
three or more arguments: access, format, nonnull and optimize.
Most do their own extensive validation and the simple checking
done by this function isn't really suitable to them, in part
because they can be meaningfully applied to add attributes to
other arguments between one declaration and another.  I'm not
sure that generalizing validate_attr_args to try to also validate
these complex attributes would be an improvement over the status
quo.  Even if it could be done, it's more ambitious of a project
than what this simple change is trying to do.

>>   /* Both arguments must be equal or, for the second pair, neither must
>>      be provided to succeed.  */
>>   bool arg1eq, arg2eq;
>>   if (TREE_CODE (newargs[0]) == INTEGER_CST)
>>     {
>>       arg1eq = tree_int_cst_equal (newargs[0], prevargs[0]);
>>       if (newargs[1] && prevargs[1])
>>     arg2eq = tree_int_cst_equal (newargs[1], prevargs[1]);
>>       else
>>     arg2eq = newargs[1] == prevargs[1];
>>     }
>>   else if (TREE_CODE (newargs[0]) == STRING_CST)
>>     {
>>       const char *s0 = TREE_STRING_POINTER (newargs[0]);
>>       const char *s1 = TREE_STRING_POINTER (prevargs[0]);
>>       arg1eq = strcmp (s0, s1) == 0;
>>       if (newargs[1] && prevargs[1])
>>     {
>>       s0 = TREE_STRING_POINTER (newargs[1]);
>>       s1 = TREE_STRING_POINTER (prevargs[1]);
>>       arg2eq = strcmp (s0, s1) == 0;
>>     }
>>       else
>>     arg2eq = newargs[1] == prevargs[1];
>>     }
> It looks like you're re-inventing operand_equal_p.

operand_equal_p does a lot more than just compare integer and string
constants for equality, so it has lots more overhead.  That said, I'm
very much in favor of code reuse even if comes with a (small) cost so
I tried this:

   /* Both arguments must be equal or, for the second pair, neither must
      be provided to succeed.  */
   bool arg1eq = newargs[0] == prevargs[0];
   if (!arg1eq)
     arg1eq = operand_equal_p (newargs[0], prevargs[0]);
   bool arg2eq = newargs[1] == prevargs[1];
   if (!arg2eq && newargs[1] && prevargs[1])
     arg2eq = operand_equal_p (newargs[1], prevargs[1]);
   if (arg1eq && arg2eq)
     return true;

Some testing revealed that the code has different semantics for
strings: it compares them including all nuls embedded in them,
while I think the right semantics for attributes is to only consider
strings up and including the first nul (i.e., apply the usual string
semantics).  A test case for this corner case is as follows:

   __attribute__ ((section ("foo\0bar"))) void f (void);
   __attribute__ ((section ("foo"))) void f (void) { }

Without operand_equal_p() GCC accepts this and puts f in section
foo.  With the operand_equal_p() change above, it complains:

ignoring attribute ‘section ("foo\x00bar")’ because it conflicts with 
previous ‘section ("foor")’ [-Wattributes]

I would rather not change the behavior in this corner case just to
save a few lines of code.  If it's thought that string arguments
to attributes (some or all) should be interpreted differently than
in other contexts it seems that such a change should be made
separately and documented in the manual.

>>   /* If the two locations are different print a note pointing to
>>      the previous one.  */
>>   const location_t curloc = input_location;
>>   const location_t prevloc =
>>     DECL_P (node[1]) ? DECL_SOURCE_LOCATION (node[1]) : curloc;
>>   /* Format the attribute specification for convenience.  */
>>   char newspec[80], prevspec[80];
>>   if (newargs[1])
>>     snprintf (newspec, sizeof newspec, "%s (%s, %s)", namestr,
>>           print_generic_expr_to_str (newargs[0]),
>>           print_generic_expr_to_str (newargs[1]));
>>   else
>>     snprintf (newspec, sizeof newspec, "%s (%s)", namestr,
>>           print_generic_expr_to_str (newargs[0]));
>>   if (prevargs[1])
>>     snprintf (prevspec, sizeof prevspec, "%s (%s, %s)", namestr,
>>           print_generic_expr_to_str (prevargs[0]),
>>           print_generic_expr_to_str (prevargs[1]));
>>   else
>>     snprintf (prevspec, sizeof prevspec, "%s (%s)", namestr,
>>           print_generic_expr_to_str (prevargs[0]));
>>   if (warning_at (curloc, OPT_Wattributes,
>>           "ignoring attribute %qs because it conflicts "
>>           "with previous %qs",
>>           newspec, prevspec)
>>       && curloc != prevloc)
>>     inform (prevloc, "previous declaration here");
> I think it would look cleaner if you separated the analysis from the 
> diagnosing.  So perhaps one function that checks if the attribute is 
> valid (returns true or false), and one function to display all these 
> warnings.

It's common practice in GCC to do both in the same function:
validate operands and issue warnings for warnings for problems.
In fact, the vast majority of attribute handlers do three things
in the same function: parse, validate, and apply attributes.  It
can be helpful to split up big functions the way you suggest but
I think this one is fine the way it is.

> Speak of which, I don't know what the consensus was, but if we 
> absolutely know that a re-declaration of an attribute for a decl is 
> invalid, I think it should be a hard error.
> Is there any case where having conflicting section attributes for a decl 
> shouldn't merit an error?

I can't think of one.  The manual seems clear that an error should
be expected "for incorrect use of supported attributes."  Despite
that, there is no consistency in how these things are handled.
Some attribute handlers issue -Wattributes, others give hard errors,
some one or the other under different conditions.

My opinion is that there should be at least two distinct kinds of
diagnostics for attributes, perhaps three: one for the use of
unknown attributes, another for invalid values of arguments to
known attributes, and may also one for invalid syntax (forms,
types, or numbers of arguments) of known attributes.  At some
point I'd like to implement and propose this approach but I don't
think this change is the right time for it.

Attached is another revision of this patch with the new helper
function you suggested.  Is this version okay to commit?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-78666.diff
Type: text/x-patch
Size: 17096 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20200824/4545f063/attachment-0001.bin>

More information about the Gcc-patches mailing list