Package nltk_lite :: Package contrib :: Module ieer_rels
[hide private]
[frames] | no frames]

Module ieer_rels

source code

Code for extracting triples of the form (subj, filler, obj) from the ieer corpus, after the latter has been converted to chunk format. sub and obj are pairs of Named Entities, and filler is the string of words occuring between sub and obj (with no intervening NEs). Subsequent processing can try to identify interesting relations expressed in filler.

Functions [hide private]
string or None
check_words(s)
Filter out strings which introduce unwanted noise.
source code
bool
check_type(tree, type=None)
Given a Named Entity (represented as a Tree), check whether it has the required type (i.e., check the tree's root node).
source code
 
_tuple2tag(item) source code
list
ne_fillers(t, stype=None, otype=None)
Search through a chunk structure, looking for relational triples.
source code
 
_expand(type) source code
generator
relextract(stype, otype, corpus='ieer', pattern=None, rcontext=None)
Extract a relation by filtering the results of ne_fillers.
source code
 
_shorten(type) source code
 
_show(item, tags=None) source code
 
show_tuple(t)
Utility function for displaying tuples in succinct format.
source code
 
demo() source code
Variables [hide private]
  ne_types = {'conll2002': ['LOC', 'PER', 'ORG'], 'conll2002-esp...
  short2long = {'LOC': 'LOCATION', 'ORG': 'ORGANIZATION', 'PER':...
  long2short = {'LOCATION': 'LOC', 'ORGANIZATION': 'ORG', 'PERSO...
  corpora = {'ieer':(d [key] for key in ['text', 'headline'] for...
Function Details [hide private]

check_words(s)

source code 

Filter out strings which introduce unwanted noise.

Parameters:
  • s (string) - The string to be filtered
Returns: string or None

check_type(tree, type=None)

source code 

Given a Named Entity (represented as a Tree), check whether it has the required type (i.e., check the tree's root node).

Parameters:
  • tree (Tree) - The candidate Named Entity
Returns: bool

ne_fillers(t, stype=None, otype=None)

source code 

Search through a chunk structure, looking for relational triples. These consist of

  • a Named Entity (i.e subtree), called the 'subject' of the triple,
  • a string of words (i.e. leaves), called the 'filler' of the triple,
  • another Named Entity, called the 'object' of the triple.

To help in data analysis, we also identify a fourth item, rcon, i.e., a few words of right context immediately following the second Named Entity.

Apart from the first and last, every Named Entity can occur as both the subject and the object of a triple.

The parameters stype and otype can be used to restrict the Named Entities to particular types (any of 'LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE').

Parameters:
  • t (Tree) - a chunk structured portion of the ieer corpus.
  • stype (string or None.) - the type of the subject Named Entity (by default, all types are admissible).
  • otype (string or None.) - the type of the object Named Entity (by default, all types are admissible).
Returns: list
a list of 4-tuples (subj, filler, obj, rcon).

relextract(stype, otype, corpus='ieer', pattern=None, rcontext=None)

source code 

Extract a relation by filtering the results of ne_fillers.

Parameters:
  • trees (list of Tree) - the syntax trees to be processed
  • stype (string) - the type of the subject Named Entity.
  • otype (string) - the type of the object Named Entity.
  • pattern (SRE_Pattern) - a regular expression for filtering the fillers of retrieved triples.
  • rcontext (bool) - if True, a few words of right context are added to the output triples.
Returns: generator
generates 3-tuples or 4-tuples <subj, filler, obj, rcontext>.

show_tuple(t)

source code 

Utility function for displaying tuples in succinct format.

Parameters:
  • t (tuple) - a (subj, filler, obj) tuple (possibly with right context as a fourth item).

Variables Details [hide private]

ne_types

Value:
{'conll2002': ['LOC', 'PER', 'ORG'],
 'conll2002-esp': ['LOC', 'PER', 'ORG'],
 'conll2002-ned': ['LOC', 'PER', 'ORG'],
 'ieer': ['LOCATION',
          'ORGANIZATION',
          'PERSON',
          'DURATION',
          'DATE',
...

short2long

Value:
{'LOC': 'LOCATION', 'ORG': 'ORGANIZATION', 'PER': 'PERSON'}

long2short

Value:
{'LOCATION': 'LOC', 'ORGANIZATION': 'ORG', 'PERSON': 'PER'}

corpora

Value:
{'ieer':(d [key] for key in ['text', 'headline'] for d in ieer.diction\
ary()), 'conll2002':(tree for tree in conll2002.ne_chunked()), 'conll2\
002-ned':(tree for tree in conll2002.ne_chunked(files= ['ned.train']))\
, 'conll2002-esp':(tree for tree in conll2002.ne_chunked(files= ['esp.\
train']))}