Package nltk_lite :: Package contrib :: Package classify :: Module cosine :: Class Cosine
[hide private]
[frames] | no frames]

Class Cosine

source code

   ClassifyI --+    
               |    
AbstractClassify --+
                   |
                  Cosine


The Cosine Classifier uses the cosine distance algorithm to compute
the distance between the sample document and each of the specified classes.
A cosine classifier needs to be trained with representative examples
of each class. From these examples the classifier
calculates the most probable classification of the sample.

                 C . S
D(C|S) = -------------------------
         sqroot(C^2) * sqroot (S^2)

Internal data structures:
_feature_dectector:
    holds a feature detector function
_classes:
    holds a list of classes supplied during training
_cls_freq_dist:
    holds a dictionary of Frequency Distributions,
    this structure is defined in probabilty.py in nltk_lite
    this structure is indexed by class names and feature types
    the frequency distributions are indexed by feature values

Instance Methods [hide private]
 
__init__(self, feature_detector) source code
 
train(self, gold)
Train classifier using representative examples of classes; creates frequency distributions of these classes
source code
 
get_class_dict(self, sample)
Returns: Dictionary (class to probability)
source code
 
_cosine(self, sample)
Returns: Dictionary class to probability
source code
 
__repr__(self) source code

Inherited from AbstractClassify: classes, get_class, get_class_list, get_class_probs, get_class_tuples

Method Details [hide private]

__init__(self, feature_detector)
(Constructor)

source code 
Parameters:
  • feature_detector - feature detector produced function, which takes a sample of object to be classified (eg: string or list of words) and returns a list of tuples (feature_type_name, list of values of this feature type)

train(self, gold)

source code 

Train classifier using representative examples of classes; creates frequency distributions of these classes

Parameters:
  • gold - dictionary mapping class names to representative examples
Overrides: ClassifyI.train

get_class_dict(self, sample)

source code 
Parameters:
  • sample ((any)) - sample to be classified
Returns:
Dictionary (class to probability)

_cosine(self, sample)

source code 
Parameters:
  • salmple - sample to be classified
Returns:
Dictionary class to probability

function uses sample to create a frequency distribution cosine distance is computed between each of the class distribustions and the sample's distribution