This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [google][gcc-4_9] encode and compress cc1 option strings in gcov_module_info


It's 1:3 to 1:4 in the programs I tested. But it really depends on how
the options are used. I think your idea of using combined strlen works
better.
I just make the code a little clumsy but it does not cause any
performance issue.

On Tue, Oct 6, 2015 at 10:21 AM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, Oct 6, 2015 at 9:26 AM, Rong Xu <xur@google.com> wrote:
>> On Mon, Oct 5, 2015 at 5:33 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>        unsigned ggc_memory = gcov_read_unsigned ();
>>> +      unsigned marker = 0, len = 0, k;
>>> +      char **string_array, *saved_cc1_strings;
>>> +
>>>        for (unsigned j = 0; j < 7; j++)
>>>
>>>
>>> Do not use hard coded number. Use the enum defined in coverage.c.
>>
>> OK.
>>
>>>
>>>
>>> +        string_array[j] = xstrdup (gcov_read_string ());
>>> +
>>> +      k = 0;
>>> +      for (unsigned j = 1; j < 7; j++)
>>>
>>> Do not use hard coded number.
>>
>> OK.
>>
>>>
>>>
>>> +        {
>>> +          if (num_array[j] == 0)
>>> +            continue;
>>> +          marker += num_array[j];
>>>
>>> It is better to read if the name of variable 'marker' is changed to
>>> 'j_end' or something similar
>>>
>>> For all the substrings of 'j' kind, there should be just one marker,
>>> right? It looks like here you introduce one marker per string, not one
>>> marker per string kind.
>>
>> I don't understand what you meant here. "marker" is fixed for each j
>> substring (one option kind) -- it the end index of the sub-string
>> array. k-loop is for each string.
>>
>
> That was a wrong comment from me. Discard it.
>
>>>
>>> +          len += 3; /* [[<FLAG>  */
>>>
>>> Same here for hard coded value.
>>>
>>> +          for (; k < marker; k++)
>>> +            len += strlen (string_array[k]) + 1; /* 1 for delimter of ']'  */
>>>
>>> Why do we need one ']' per string?
>>
>> This is because the options strings can contain space '  '. I cannot
>> use space as the delimiter, neither is \0 as it is the end of the
>> string of the encoded string.
>
> Ok -- this allows you to avoid string copy during parsing.
>>
>>>
>>>
>>> +        }
>>> +      saved_cc1_strings = (char *) xmalloc (len + 1);
>>> +      saved_cc1_strings[0] = 0;
>>> +
>>> +      marker = 0;
>>> +      k = 0;
>>> +      for (unsigned j = 1; j < 7; j++)
>>>
>>> Same here for 7.
>>
>> will fix in the new patch.
>>
>>>
>>> +        {
>>> +          static const char lipo_string_flags[6] = {'Q', 'B', 'S',
>>> 'D','I', 'C'};
>>> +          if (num_array[j] == 0)
>>> +            continue;
>>> +          marker += num_array[j];
>>>
>>> Suggest changing marker to j_end
>> OK.
>>
>>>
>>> +          sprintf (saved_cc1_strings, "%s[[%c", saved_cc1_strings,
>>> +                   lipo_string_flags[j - 1]);
>>> +          for (; k < marker; k++)
>>> +            {
>>> +              sprintf (saved_cc1_strings, "%s%s]", saved_cc1_strings,
>>> +                       string_array[k]);
>>>
>>> +#define DELIMTER            "[["
>>>
>>> Why double '[' ?
>> I will change to single '['.
>>
>>>
>>> +#define DELIMTER2           "]"
>>> +#define QUOTE_PATH_FLAG     'Q'
>>> +#define BRACKET_PATH_FLAG   'B'
>>> +#define SYSTEM_PATH_FLAG    'S'
>>> +#define D_U_OPTION_FLAG     'D'
>>> +#define INCLUDE_OPTION_FLAG 'I'
>>> +#define COMMAND_ARG_FLAG    'C'
>>> +
>>> +enum lipo_cc1_string_kind {
>>> +  k_quote_paths = 0,
>>> +  k_bracket_paths,
>>> +  k_system_paths,
>>> +  k_cpp_defines,
>>> +  k_cpp_includes,
>>> +  k_lipo_cl_args,
>>> +  num_lipo_cc1_string_kind
>>> +};
>>> +
>>> +struct lipo_parsed_cc1_string {
>>> +  const char* source_filename;
>>> +  unsigned num[num_lipo_cc1_string_kind];
>>> +  char **strings[num_lipo_cc1_string_kind];
>>> +};
>>> +
>>> +struct lipo_parsed_cc1_string *
>>> +lipo_parse_saved_cc1_string (const char *src, char *str,
>>> +    bool parse_cl_args_only);
>>> +void free_parsed_string (struct lipo_parsed_cc1_string *string);
>>> +
>>>
>>> Declare above in a header file.
>>
>> OK.
>>
>>>
>>>
>>>  /* Returns true if the command-line arguments stored in the given module-infos
>>>     are incompatible.  */
>>>  bool
>>> -incompatible_cl_args (struct gcov_module_info* mod_info1,
>>> -      struct gcov_module_info* mod_info2)
>>> +incompatible_cl_args (struct lipo_parsed_cc1_string* mod_info1,
>>> +                      struct lipo_parsed_cc1_string* mod_info2)
>>>
>>> Fix formating.
>> OK.
>>>
>>>  {
>>>      {
>>> @@ -1647,7 +1679,7 @@ build_var (tree fn_decl, tree type, int counter)
>>>  /* Creates the gcov_fn_info RECORD_TYPE.  */
>>>
>>>                        NULL_TREE, get_gcov_unsigned_t ());
>>>    DECL_CHAIN (field) = fields;
>>>    fields = field;
>>>
>>> -  /* Num bracket paths  */
>>> +  /* cc1_uncompressed_strlen field */
>>>    field = build_decl (BUILTINS_LOCATION, FIELD_DECL,
>>>                        NULL_TREE, get_gcov_unsigned_t ());
>>>    DECL_CHAIN (field) = fields;
>>>    fields = field;
>>>
>>>
>>> Why do we need to store uncompressed string length? If there is need
>>> to do that, I suggest combine uncompressed length and compressed
>>> length into one 32bit integer. (16/16, or 17/15 split)
>>
>> In theory, I don't need the uncompressed length, But I would need to
>> guess the uncompressed length to allocate the buffer. If the
>> decompressing fails with insufficient space, I need to have another
>> guess and do it again.  I think it's easier just save the uncompressed
>> size to the struct. I think combine the two sizes into one
>> gcov_unsigned_t is a good idea. We don't expect the string size are
>> very big.
>
> What is the usual compression ratio? If you guess it right most of the
> time, just get rid of the uncompressed length.
>
> David
>
>>>
>>>
>>>  David
>>>
>>> On Mon, Oct 5, 2015 at 3:51 PM, Rong Xu <xur@google.com> wrote:
>>>> Hi,
>>>>
>>>> This patch is for google branch only.
>>>>
>>>> It encodes and compresses various cc1 option strings in
>>>> gcov_module_info to reduce the lipo instrumented object size. The
>>>> savings are from string compression and the reduced number of
>>>> relocations.
>>>>
>>>> More specifically, we replace the following fields in gcov_module_info
>>>>   gcov_unsigned_t num_quote_paths;
>>>>   gcov_unsigned_t num_bracket_paths;
>>>>   gcov_unsigned_t num_system_paths;
>>>>   gcov_unsigned_t num_cpp_defines;
>>>>   gcov_unsigned_t num_cpp_includes;
>>>>   gcov_unsigned_t num_cl_args;
>>>>   char *string_array[1];
>>>> with
>>>>   gcov_unsigned_t cc1_strlen;
>>>>   gcov_unsigned_t cc1_uncompressed_strlen;
>>>>   char *saved_cc1_strings;
>>>>
>>>> The new saved_cc1_strings are zlib compressed string.
>>>>
>>>> Tested with google internal benchmarks.
>>>>
>>>> Thanks,
>>>>
>>>> -Rong


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]