The Levenshtein distance is used for measuring the “distance” or similarity of two character strings. Other similarity algorithms can be supplied to the code that does the matching.
To exercise the code, the lookupserver can be used with the method “matches(message, max_candidates=15, min_similarity=50)”, or the classfile “Levenshtein.py” can be executed directly with
python Levenshtein.py "The first string." "The second string"
(remember to quote the two parameters)
The following things should be noted:
Only the first MAX_LEN characters are considered. Long strings differing at the end will therefore seem to match better than they should. A penalty is awarded if strings are shortened.
The calculation can stop prematurely as soon as it realise that the supplied minimum required similarity can not be reached. Strings with widely different lengths give the opportunity for this shortcut. This is by definition of the Levenshtein distance: the distance will be at least as much as the difference in string length. Similarities lower than your supplied minimum (or the default) should therefore not be considered authoritive.