Package nltk_lite :: Package tag :: Module unigram :: Class Affix
[hide private]
[frames] | no frames]

Class Affix

source code

     object --+            
              |            
yaml.YAMLObject --+        
                  |        
               TagI --+    
                      |    
      SequentialBackoff --+
                          |
                         Affix
Known Subclasses:
contrib.marshal.MarshalAffix

A unigram tagger that assign tags to tokens based on leading or trailing substrings (it is important to note that the substrings are not necessarily "true" morphological affixes). Before tag.Affix can be used, it should be trained on a tagged corpus. Using this training data, it will find the most likely tag for each word type. It will then use this information to assign the most frequent tag to each word. If the tag.Affix encounters a prefix or suffix in a word for which it has no data, it will assign the tag None.

Nested Classes [hide private]

Inherited from yaml.YAMLObject: __metaclass__, yaml_dumper, yaml_loader

Instance Methods [hide private]
 
__init__(self, length, minlength, cutoff=1, backoff=None)
Construct a new affix stochastic tagger.
source code
 
_get_affix(self, token) source code
 
train(self, tagged_corpus, verbose=True)
Train tag.Affix using the given training data.
source code
 
tag_one(self, token, history=None) source code
 
size(self) source code
 
__repr__(self)
repr(x)
source code

Inherited from SequentialBackoff: tag, tag_sents

Inherited from SequentialBackoff (private): _backoff_tag_one

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Methods [hide private]

Inherited from yaml.YAMLObject: from_yaml, to_yaml

Class Variables [hide private]

Inherited from yaml.YAMLObject: yaml_flow_style, yaml_tag

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, length, minlength, cutoff=1, backoff=None)
(Constructor)

source code 

Construct a new affix stochastic tagger. The new tagger should be trained, using the train() method, before it is used to tag data.

Parameters:
  • length (number) - The length of the affix to be considered during training and tagging (negative for suffixes)
  • minlength (number) - The minimum length for a word to be considered during training and tagging. It must be longer that length.
Overrides: object.__init__

train(self, tagged_corpus, verbose=True)

source code 

Train tag.Affix using the given training data. If this method is called multiple times, then the training data will be combined.

Parameters:
  • tagged_corpus (list or iter(list)) - A tagged corpus. Each item should be a list of tagged tokens, where each consists of text and a tag.

__repr__(self)
(Representation operator)

source code 

repr(x)

Overrides: object.__repr__
(inherited documentation)