Package Bio :: Package SeqIO :: Module StockholmIO :: Class StockholmIterator
[hide private]
[frames] | no frames]

Class StockholmIterator

source code

      Interfaces.SequenceIterator --+    
                                    |    
Interfaces.InterlacedSequenceIterator --+
                                        |
                                       StockholmIterator

Loads a Stockholm file from PFAM into SeqRecord objects

The entire file is loaded, and any sequence can be accessed using the [index] notation.

This parser will detect if the Stockholm file follows the PFAM conventions for sequence specific meta-data (lines starting #=GS and #=GR) and populates the SeqRecord fields accordingly.

Any annotation which does not follow the PFAM conventions is currently ignored.

If an accession is provided for an entry in the meta data, IT WILL NOT be used as the record.id (it will be recorded in the record's annotations). This is because some files have (sub) sequences from different parts of the same accession (differentiated by different start-end positions).

Wrap-around alignments are not supported - each sequences must be on a single line. However, interlaced sequences should work.

For more information on the file format, please see: http://www.bioperl.org/wiki/Stockholm_multiple_alignment_format http://www.cgb.ki.se/cgb/groups/sonnhammer/Stockholm.html

For consistency with BioPerl and EMBOSS we call this the "stockholm" format.

Instance Methods [hide private]
 
__init__(self, handle, alphabet=SingleLetterAlphabet())
Create a StockholmIterator object (which returns SeqRecord objects).
source code
 
ParseAlignment(self) source code
 
__len__(self)
Return the number of record
source code
 
_identifier_split(self, identifier)
Returns (name,start,end) string tuple from an identier
source code
 
_get_meta_data(self, identifier, meta_dict)
Takes an itentifier and returns dict of all meta-data matching it.
source code
 
_populate_meta_data(self, identifier, record)
Adds meta-date to a SecRecord's annotations dictionary
source code
 
__getitem__(self, i)
Provides random access to the SeqRecords
source code

Inherited from Interfaces.InterlacedSequenceIterator: __iter__, move_start, next

Class Variables [hide private]
  pfam_gr_mapping = {'AS': 'active_site', 'IN': 'intron', 'LI': ...
  pfam_gs_mapping = {'LO': 'look', 'OC': 'organism_classificatio...
Method Details [hide private]

__init__(self, handle, alphabet=SingleLetterAlphabet())
(Constructor)

source code 

Create a StockholmIterator object (which returns SeqRecord objects).

handle - input file alphabet - optional alphabet

Overrides: Interfaces.SequenceIterator.__init__

__len__(self)
(Length operator)

source code 

Return the number of record

This method should be replaced by any derived class to do something useful.

Overrides: Interfaces.InterlacedSequenceIterator.__len__
(inherited documentation)

_get_meta_data(self, identifier, meta_dict)

source code 

Takes an itentifier and returns dict of all meta-data matching it.

For example, given "Q9PN73_CAMJE/149-220" will return all matches to this or "Q9PN73_CAMJE" which the identifier without its /start-end suffix.

In the example below, the suffix is required to match the AC, but must be removed to match the OS and OC meta-data.

# STOCKHOLM 1.0 #=GS Q9PN73_CAMJE/149-220 AC Q9PN73 ... Q9PN73_CAMJE/149-220 NKA... ... #=GS Q9PN73_CAMJE OS Campylobacter jejuni #=GS Q9PN73_CAMJE OC Bacteria

This function will return an empty dictionary if no data is found

_populate_meta_data(self, identifier, record)

source code 

Adds meta-date to a SecRecord's annotations dictionary

This function applies the PFAM conventions.

__getitem__(self, i)
(Indexing operator)

source code 

Provides random access to the SeqRecords

Overrides: Interfaces.InterlacedSequenceIterator.__getitem__

Class Variable Details [hide private]

pfam_gr_mapping

Value:
{'AS': 'active_site',
 'IN': 'intron',
 'LI': 'ligand_binding',
 'PP': 'posterior_probability',
 'SA': 'surface_accessibility',
 'SS': 'secondary_structure',
 'TM': 'transmembrane'}

pfam_gs_mapping

Value:
{'LO': 'look', 'OC': 'organism_classification', 'OS': 'organism'}