[Bug other/69968] New: RFC: Use Damerau-Levenshtein within spellcheck.c, rather than Levenshtein

dmalcolm at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Feb 25 22:21:00 GMT 2016


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69968

            Bug ID: 69968
           Summary: RFC: Use Damerau-Levenshtein within spellcheck.c,
                    rather than Levenshtein
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dmalcolm at gcc dot gnu.org
  Target Milestone: ---

(quoting Steven Bosscher)

----------------------------------
$ cat t.c
void foo (void);

struct {
  int coordx, coordy, coordz;
  int coordx1, coordy1, coordz1;
} c;

void foo (void)
{
  c.coordx1 = c.coordy1* c.coordz;
  c.coorzd1 = c.coordy;
}

$ ./cc1 -quiet -Wall -Wextra t.c
t.c: In function 'foo':
t.c:11:4: error: 'struct <anonymous>' has no member named 'coorzd1';
did you mean 'coordx'?
   c.coorzd1 = c.coordy;
    ^
----------------------------------

Note that z and d are swapped. The Levenshtein metric returns "coordx"
as the best match, but it requires 2 insertions and one deletion to go
from "coorzd1" to "coordx", or to "coordz"/"coordy" -- "coordx" is just the
first of 3 with the same Levenshtein distance.

With Damerau-Levenshtein, we'd be able to recognize the (apparently
most common) mistake of having 2 characters swapped. The
Damerau-Levenshtein distance to "coordz1" is 1 and there is no field in
the struct with a smaller distance.


More information about the Gcc-bugs mailing list