Package nltk_lite :: Module utilities
[hide private]
[frames] | no frames]

Module utilities

source code

Classes [hide private]
  SortedDict
A very rudamentary sorted dictionary, whose main purpose is to allow dictionaries to be displayed in a consistent order in regression tests.
  OrderedDict
This implementation of a dictionary keeps track of the order in which keys were inserted.
  MinimalSet
Find contexts where more than one possible target value can appear.
  Counter
A counter that auto-increments each time its value is read.
  Trie
A Trie is like a dictionary in that it maps keys to values.
Functions [hide private]
 
pr(data, start=0, end=None)
Pretty print a sequence of data items
source code
 
print_string(s, width=70)
Pretty print a string, breaking lines on whitespace
source code
 
_edit_dist_init(len1, len2) source code
 
_edit_dist_step(lev, i, j, c1, c2) source code
 
edit_dist(s1, s2)
Calculate the Levenshtein edit-distance between two strings.
source code
string
re_show(regexp, string)
Search string for substrings matching regexp and wrap the matches with braces.
source code
 
filestring(f) source code
 
breadth_first(tree, children=<built-in function iter>, depth=-1)
Traverse the nodes of a tree in breadth-first order.
source code
 
guess_encoding(data)
Given a byte string, attempt to decode it.
source code
Function Details [hide private]

pr(data, start=0, end=None)

source code 

Pretty print a sequence of data items

Parameters:
  • data (sequence or iterator) - the data stream to print
  • start (int) - the start position
  • end (int) - the end position

print_string(s, width=70)

source code 

Pretty print a string, breaking lines on whitespace

Parameters:
  • s (string) - the string to print, consisting of words and spaces
  • width (int) - the display width

edit_dist(s1, s2)

source code 

Calculate the Levenshtein edit-distance between two strings. The edit distance is the number of characters that need to be substituted, inserted, or deleted, to transform s1 into s2. For example, transforming "rain" to "shine" requires three steps, consisting of two substitutions and one insertion: "rain" -> "sain" -> "shin" -> "shine". These operations could have been done in other orders, but at least three steps are needed.

Parameters:
  • s1, s2 - The strings to be analysed

re_show(regexp, string)

source code 

Search string for substrings matching regexp and wrap the matches with braces. This is convenient for learning about regular expressions.

Parameters:
  • regexp - The regular expression.
  • string - The string being matched.
Returns: string
A string with braces surrounding the matched substrings.

breadth_first(tree, children=<built-in function iter>, depth=-1)

source code 

Traverse the nodes of a tree in breadth-first order. The first argument should be the tree root; children should be a function taking as argument a tree node and returning an iterator of the node's children.

guess_encoding(data)

source code 

Given a byte string, attempt to decode it.
Tries the standard 'UTF8' and 'latin-1' encodings,
Plus several gathered from locale information.

The calling program *must* first call 
    locale.setlocale(locale.LC_ALL, '')

If successful it returns 
    (decoded_unicode, successful_encoding)
If unsuccessful it raises a ``UnicodeError``