Package Bio :: Package GenBank
[hide private]
[frames] | no frames]

Package GenBank

source code

Code to work with GenBank formatted files.

Classes:
Iterator              Iterate through a file of GenBank entries
Dictionary            Access a GenBank file using a dictionary interface.
ErrorFeatureParser    Catch errors caused during parsing.
FeatureParser         Parse GenBank data in Seq and SeqFeature objects.
RecordParser          Parse GenBank data into a Record object.
NCBIDictionary        Access GenBank using a dictionary interface.

_BaseGenBankConsumer  A base class for GenBank consumer that implements
                      some helpful functions that are in common between
                      consumers.
_FeatureConsumer      Create SeqFeature objects from info generated by
                      the Scanner
_RecordConsumer       Create a GenBank record object from Scanner info.
_PrintingConsumer     A debugging consumer.

ParserFailureError    Exception indicating a failure in the parser (ie.
                      scanner or consumer)
LocationParserError   Exception indiciating a problem with the spark based
                      location parser.

Functions:
index_file            Get a GenBank file ready to be used as a Dictionary.
search_for            Do a query against GenBank.
download_many         Download many GenBank records.

Submodules [hide private]

Classes [hide private]
  Dictionary
Access a GenBank file using a dictionary-like interface.
  Iterator
Iterator interface to move over a file of GenBank entries one at a time.
  ParserFailureError
Failure caused by some kind of problem in the parser.
  LocationParserError
Could not Properly parse out a location from a GenBank file.
  FeatureParser
Parse GenBank files into Seq + Feature objects.
  RecordParser
Parse GenBank files into Record objects
  _BaseGenBankConsumer
Abstract GenBank consumer providing useful general functions.
  _FeatureConsumer
Create a SeqRecord object with Features to return.
  _RecordConsumer
Create a GenBank Record object from scanner generated information.
  NCBIDictionary
Access GenBank using a read-only dictionary interface.
Functions [hide private]
 
_strip_and_combine(line_list)
Combine multiple lines of content separated by spaces.
source code
 
index_file(filename, indexname, rec2key=None, use_berkeley=0)
Index a GenBank file to prepare it for use as a dictionary.
source code
 
search_for(search, database='nucleotide', reldate=None, mindate=None, maxdate=None, start_id=0, max_ids=50000000)
search_for(search[, reldate][, mindate][, maxdate] [, batchsize][, delay][, callback_fn][, start_id][, max_ids]) -> ids
source code
handle of results
download_many(ids, database='nucleotide')
Download many records from GenBank.
source code
Variables [hide private]
  GENBANK_INDENT = 12
  GENBANK_SPACER = ' '
  FEATURE_KEY_INDENT = 5
  FEATURE_QUALIFIER_INDENT = 21
  FEATURE_KEY_SPACER = ' '
  FEATURE_QUALIFIER_SPACER = ' '
Function Details [hide private]

_strip_and_combine(line_list)

source code 

Combine multiple lines of content separated by spaces.

This function is used by the EventGenerator callback function to combine multiple lines of information. The lines are first stripped to remove whitepsace, and then combined so they are separated by a space. This is a simple minded way to combine lines, but should work for most cases.

index_file(filename, indexname, rec2key=None, use_berkeley=0)

source code 

Index a GenBank file to prepare it for use as a dictionary. DEPRECATED

Arguments: filename - The name of the GenBank file to be indexed. indexname - The name of the index to create rec2key - A reference to a function object which, when called with a SeqRecord object, will return a key to be used for the record. If no function is specified then the records will be indexed by the 'id' attribute of the SeqRecord (the versioned GenBank id). use_berkeley - specifies whether to use the BerkeleyDB indexer, which uses the bsddb3 wrappers around the embedded database Berkeley DB. By default, the standard flat file (non-Berkeley) indexes are used.

search_for(search, database='nucleotide', reldate=None, mindate=None, maxdate=None, start_id=0, max_ids=50000000)

source code 

search_for(search[, reldate][, mindate][, maxdate] [, batchsize][, delay][, callback_fn][, start_id][, max_ids]) -> ids

Search GenBank and return a list of the GenBank identifiers (gi's) that match the criteria. search is the search string used to search the database. Valid values for database are 'nucleotide', 'protein', 'popset' and 'genome'. reldate is the number of dates prior to the current date to restrict the search. mindate and maxdate are the dates to restrict the search, e.g. 2002/01/01. start_id is the number to begin retrieval on. max_ids specifies the maximum number of id's to retrieve.

batchsize, delay and callback_fn are old parameters for compatibility -- do not set them.

download_many(ids, database='nucleotide')

source code 

Download many records from GenBank. ids is a list of gis or accessions.

callback_fn, broken_fn, delay, faildelay, batchsize, parser are old parameter for compatibility. They should not be used.

Returns: handle of results