[PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

Bernhard Reutner-Fischer rep.dot.nop@gmail.com
Tue Dec 1 17:51:00 GMT 2015


On 1 December 2015 at 18:28, David Malcolm <dmalcolm@redhat.com> wrote:
> On Tue, 2015-12-01 at 13:55 +0100, Bernhard Reutner-Fischer wrote:


>> +/* Lookup function FN fuzzily, taking names in FUN into account.  */
>> +
>> +const char*
>> +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
>> +{
>> +  auto_vec <const char *> candidates;
>> +  lookup_function_fuzzy_find_candidates (fun, &candidates);
>> +
>> +  /* Determine closest match.  */
>> +  int i;
>> +  const char *name, *best = NULL;
>> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
>> +
>> +  FOR_EACH_VEC_ELT (candidates, i, name)
>> +    {
>> +      edit_distance_t dist = levenshtein_distance (fn, name);
>> +      if (dist < best_distance)
>> +     {
>> +       best_distance = dist;
>> +       best = name;
>> +     }
>> +    }
>> +  /* If more than half of the letters were misspelled, the suggestion is
>> +     likely to be meaningless.  */
>> +  if (best)
>> +    {
>> +      unsigned int cutoff = MAX (strlen (fn), strlen (best)) / 2;
>> +      if (best_distance > cutoff)
>> +     return NULL;
>> +    }
>> +  return best;
>> +}
>
>
> Caveat: I'm not very familiar with the Fortran FE, so take the following
> with a pinch of salt.
>
> If I'm reading things right, here, and in various other places, you're
> building a vec of const char *, and then seeing which one of those
> candidates is the best match for another const char *.
>
> You could simplify things by adding a helper function to spellcheck.h,
> akin to this one:
>
> extern tree
> find_closest_identifier (tree target, const auto_vec<tree> *candidates);

I was hoping for ipa-icf to fix that up on my behalf. I'll try to see
if it does. Short of that: yes, should do that.

>
> This would reduce the amount of duplication in the patch (and slightly
> reduce the amount of C++).

As said, we could as well use a list of candidates with NULL as record marker.
Implementation cosmetics. Steve seems to not be thrilled by the
overall idea in the first place, so unless there is clear support by
somebody else i won't pursue this any further, it's not that i'm bored
or ran out of stuff i should do.. ;)
>
> [are there IDENTIFIER nodes in the Fortran FE, or is it all const char
> *? this would avoid some strlen calls]

Right, but in the Fortran FE these are const char*.

thanks for your comments!



More information about the Gcc-patches mailing list