This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/2] Implement Levenshtein distance


Hello,

Le 30/10/2015 13:47, David Malcolm a Ãcrit :
This patch adds an implementation of Levenshtein distance to gcc,
along with unit testing of the algorithm.

I noticed two nits while looking at it.


diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
new file mode 100644
index 0000000..532df58
--- /dev/null
+++ b/gcc/spellcheck.c
@@ -0,0 +1,136 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "spellcheck.h"
+
+/* The Levenshtein distance is an "edit-distance": the minimal
+   number of one-character insertions, removals or substitutions
+   that are needed to change one string into another.
+
+   This implementation uses the Wagner-Fischer algorithm.  */
+
You forgot to explain that m and n are the lengths of s and t respectively. You may want to just use a more descriptive name for them.

+static edit_distance_t
+levenshtein_distance (const char *s, int m,
+		      const char *t, int n)
+{
+  const bool debug = false;
+
+  if (debug)
+    {
+      printf ("s: \"%s\" (m=%i)\n", s, m);
+      printf ("t: \"%s\" (n=%i)\n", t, n);
+    }
+
+  if (m == 0)
+    return n;
+  if (n == 0)
+    return m;
+
+  /* We effectively build a matrix where each (i, j) contains the
+     Levenshtein distance between the prefix strings s[0:i]
+     and t[0:j].
The code seems to use s[0:j] and t[0:i] instead, doesn't it?

Thanks
Mikael


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]