### !!!WARNING!!!

**Once again, WordPress managed to silently change my code and make it unusable. And this time, it does it too well: I can’t fix it. This (code killing) is the only WP feature which works good, but I must admit that it is definitely perfect.**

**For this reason, please ignore the code examples below, and download this archive, which contains my Stored Functions, a Test Case for my Functions, and Arjen Lentz’s Stored Function.**

**—**

I found a Levenshtein Distance function, implemented as SQL Stored Function for MySQL and MariaDB, written by Arjen Lentz. That post also contains a short but interesting discussion about the advantages and disadvantages of Stored Functions over UDFs.

For those who don’t know, the **Levenshtein Distance** (or **Edit Distance**) reads two strings and returns the minimum number of *atomic edits* that must be done to the first string to make it equal to the second string (or vice versa, as you like). An atomic edit can be an *addition*, a *deletion* or a *substitution* of a single character.

Example:

`String 1: abcde_1`

String 2: bcdef_2

Dinstance: 3.

The required atomic edits are: delete a, add f, change 1 to 2.

I tried to improve the function. Well, my changes are trivial (because Arjen Lentz is a genius, as you can see by taking a look to `OQGRAPH`

code, and I’m not), but I wish to share them, as Lentz asked in his post. I also wrote a `levenshtein_ratio`

Function, and you can find it below.

### My Levenshtein

Here is my “fork”:

```
DROP FUNCTION IF EXISTS `levenshtein`;
CREATE FUNCTION `levenshtein`(`s1` VARCHAR(255) CHARACTER SET utf8, `s2` VARCHAR(255) CHARACTER SET utf8)
RETURNS TINYINT UNSIGNED
NO SQL
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp TINYINT UNSIGNED;
-- max strlen=255 for this function
DECLARE cv0, cv1 VARBINARY(256);
-- if any param is NULL return NULL
-- (consistent with builtin functions)
IF (s1 + s2) IS NULL THEN
RETURN NULL;
END IF;
SET s1_len = CHAR_LENGTH(s1),
s2_len = CHAR_LENGTH(s2),
cv1 = 0x00,
j = 1,
i = 1,
c = 0;
-- if any string is empty,
-- distance is the length of the other one
IF (s1 = s2) THEN
RETURN 0;
ELSEIF (s1_len = 0) THEN
RETURN s2_len;
ELSEIF (s2_len = 0) THEN
RETURN s1_len;
END IF;
WHILE (j <= s2_len) DO
SET cv1 = CONCAT(cv1, CHAR(j)),
j = j + 1;
END WHILE;
WHILE (i <= s1_len) DO
SET c = i,
cv0 = CHAR(i),
j = 1;
WHILE (j <= c_temp) THEN
SET c = c_temp;
END IF;
SET c_temp = ORD(SUBSTRING(cv1, j+1, 1)) + 1;
IF (c > c_temp) THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, CHAR(c)),
j = j + 1;
END WHILE;
SET cv1 = cv0,
i = i + 1;
END WHILE;
RETURN c;
END;
```

What are the differences? Well, excluding the trivial ones:

- If one param is
`NULL`

, my Function returns`NULL`

. This is the behavior of My and Maria’s builtin functions, and is logically correct: if you don’t know one of the values, you don’t know the Levenshtein Distance. - Since params have a limit of 255 characters, I turned all numeric variable types and Function return type to
`TINYINT UNSIGNED`

. From my tests, this seems to me correct;**please, let me know if I’m wrong**. - I dropped 2 vars:
`s1_chars`

and`cost`

. They were there just to add readability, but I think that the code is still readable as before.

### Levenshtein Ratio

Usually you don’t care about the Levenshtein Distance as an absolute value. If the first string is ‘ace’ and the second string is ‘see’, it’s very unlikely that one of them is misspelled. But if the first string is ‘no woman no cry’ and the second string is ‘no womn no cr’, the second string is probably misspelled. But in both cases, the Levenshtein Distance is 2. What can help us is the ratio between the length of the longest string and the Levenshtein Distance. In the first case the ratio is 67, while in the second case it’s 13. The first value means that 2/3 of the characters are different, the second value tells us that the difference is little.

That’s why I wrote the `levenshtein_ratio()`

Stored Function:

```
DROP FUNCTION IF EXISTS `levenshtein_ratio`;
CREATE FUNCTION `levenshtein_ratio`(`s1` VARCHAR(255) CHARACTER SET utf8, `s2` VARCHAR(255) CHARACTER SET utf8)
RETURNS TINYINT UNSIGNED
DETERMINISTIC
NO SQL
COMMENT 'Levenshtein ratio between strings'
BEGIN
DECLARE s1_len TINYINT UNSIGNED DEFAULT CHAR_LENGTH(s1);
DECLARE s2_len TINYINT UNSIGNED DEFAULT CHAR_LENGTH(s2);
RETURN ((levenshtein(s1, s2) / IF(s1_len > s2_len, s1_len, s2_len)) * 100);
END;
```

Like Arjen Lentz, if you optimize one of my Functions, I hope you’ll lets me know.

*Enjoy!*