Package nltk_lite :: Package chunk :: Module convert
[hide private]
[frames] | no frames]

Module convert

source code

Functions [hide private]
tree
tagstr2tree(s, chunk_node='NP', top_node='S')
Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree.
source code
Tree
conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')
Convert a CoNLL IOB string into a tree.
source code
list of tuple
tree2conlltags(t)
Convert a tree to the CoNLL IOB tag format
source code
string
tree2conllstr(t)
Convert a tree to the CoNLL IOB string format
source code
 
_ieer_read_text(s, top_node) source code
Tree
ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')
Convert a string of chunked tagged text in the IEER named entity format into a chunk structure.
source code
 
demo() source code
Variables [hide private]
  _LINE_RE = re.compile(r'(\S+)\s+(\S+)\s+([IOB])-?(\S+)?')
  _IEER_DOC_RE = re.compile(r'(?s)<DOC>\s*(<DOCNO>\s*(?P<docno>....
  _IEER_TYPE_RE = re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+...
Function Details [hide private]

tagstr2tree(s, chunk_node='NP', top_node='S')

source code 

Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ([...]). Words are delimited by whitespace, and each word should have the form text/tag. Words that do not contain a slash are assigned a tag of None.

Parameters:
  • s (string) - The string to be converted
  • chunk_node (string) - The label to use for chunk nodes
  • top_node (string) - The label to use for the root of the tree
Returns: tree
A tree corresponding to the string representation.

conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')

source code 

Convert a CoNLL IOB string into a tree. Uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).

Parameters:
  • s (string) - The CoNLL string to be converted.
  • chunk_types (tuple) - The chunk types to be converted.
  • top_node - The node label to use for the root.
Returns: Tree
A chunk structure for a single sentence encoded in the given CONLL 2000 style string.

tree2conlltags(t)

source code 

Convert a tree to the CoNLL IOB tag format

Parameters:
  • t (Tree) - The tree to be converted.
Returns: list of tuple
A list of 3-tuples containing word, tag and IOB tag.

tree2conllstr(t)

source code 

Convert a tree to the CoNLL IOB string format

Parameters:
  • t (Tree) - The tree to be converted.
Returns: string
A multiline string where each line contains a word, tag and IOB tag.

ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')

source code 

Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.

Returns: Tree
A chunk structure containing the chunked tagged text that is encoded in the given IEER style string.

Variables Details [hide private]

_IEER_DOC_RE

Value:
re.compile(r'(?s)<DOC>\s*(<DOCNO>\s*(?P<docno>.+?)\s*</DOCNO>\s*)?(<DO\
CTYPE>\s*(?P<doctype>.+?)\s*</DOCTYPE>\s*)?(<DATE_TIME>\s*(?P<date_tim\
e>.+?)\s*</DATE_TIME>\s*)?<BODY>\s*(<HEADLINE>\s*(?P<headline>.+?)\s*<\
/HEADLINE>\s*)?<TEXT>(?P<text>.*?)</TEXT>\s*</BODY>\s*</DOC>\s*')

_IEER_TYPE_RE

Value:
re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+)"')