Package nltk_lite :: Package corpora :: Module toolbox
[hide private]
[frames] | no frames]

Module toolbox

source code

Module for reading, writing and manipulating Toolbox databases.

Classes [hide private]
  StandardFormat
Class for reading and processing standard format marker files and strings.
  ToolboxData
Functions [hide private]
ElementTree._ElementInterface
parse_corpus(file_name, key=None, **kwargs)
Return an element tree resulting from parsing the toolbox datafile.
source code
string
to_sfm_string(tree, encoding=None, errors='strict', unicode_fields=None)
Return a string with a standard format representation of the toolbox data in tree (tree can be a toolbox database or a single record).
source code
iterator over list(string)
_parse_record(s)
Deprecated: use StandardFormat.fields()
source code
iterator over list(string)
raw(files='rotokas.dic', include_header=True, head_field_marker=None)
Deprecated: use StandardFormat.fields()
source code
iterator over dict
dictionary(files='rotokas.dic', include_header=True)
Deprecated: use ToolboxData.parse()
source code
 
_dict_list_entry(entry) source code
iterator over dict
dict_list(files='rotokas.dic', include_header=True)
Deprecated: use ToolboxData.parse()
source code
 
demo() source code
Variables [hide private]
  _is_value = re.compile(r'\S')
Function Details [hide private]

parse_corpus(file_name, key=None, **kwargs)

source code 

Return an element tree resulting from parsing the toolbox datafile.

A convenience function that creates a ToolboxData object, opens and parses the toolbox data file. The data file is assumed to be in the toolbox subdirectory of the directory where NLTK looks for corpora, see corpora.get_basedir().

Parameters:
  • file_name (string) - Name of file in toolbox corpus directory
  • key (string) - marker at the start of each record
  • kwargs (keyword arguments dictionary) - Keyword arguments passed to ToolboxData.parse()
Returns: ElementTree._ElementInterface
contents of toolbox data divided into header and records

to_sfm_string(tree, encoding=None, errors='strict', unicode_fields=None)

source code 

Return a string with a standard format representation of the toolbox data in tree (tree can be a toolbox database or a single record).

Parameters:
  • tree (ElementTree._ElementInterface) - flat representation of toolbox data (whole database or single record)
  • encoding (string) - Name of an encoding to use.
  • errors (string) - Error handling scheme for codec. Same as the encode inbuilt string method.
  • unicode_fields (string) -
Returns: string
string using standard format markup

_parse_record(s)

source code 

Deprecated: use StandardFormat.fields()

Parameters:
  • s (string) - toolbox record as a string
Returns: iterator over list(string)

raw(files='rotokas.dic', include_header=True, head_field_marker=None)

source code 

Deprecated: use StandardFormat.fields()

Parameters:
  • files (string or tuple(string)) - One or more toolbox files to be processed
  • include_header (boolean) - flag that determines whether to treat header as record (default is no)
  • head_field_marker (string) - option for explicitly setting which marker to use as the head field when parsing the file (default is automatically determining it from the first field of the first record)
Returns: iterator over list(string)

dictionary(files='rotokas.dic', include_header=True)

source code 

Deprecated: use ToolboxData.parse()

Parameters:
  • files (string or tuple(string)) - One or more toolbox files to be processed
  • include_header (boolean) - treat header as entry?
Returns: iterator over dict

dict_list(files='rotokas.dic', include_header=True)

source code 

Deprecated: use ToolboxData.parse()

Parameters:
  • files (string or tuple(string)) - One or more toolbox files to be processed
  • include_header (boolean) - treat header as entry?
Returns: iterator over dict