Package SeqIO
source code
Sequence input/output designed to look similar to the bioperl design.
Input
=====
The main function is Bio.SeqIO.parse(...) which takes an input file handle,
and format string. This returns an iterator giving SeqRecord objects.
from Bio import SeqIO
handle = open("example.fasta", "rU")
for record in SeqIO.parse(handle, "fasta") :
print record
Note that the parse() function will all invoke the relevant parser for
the format with its default settings. You may want more control, in which case
you need to create a format specific sequence iterator directly.
For non-interlaced files (e.g. Fasta, GenBank, EMBL) with multiple records
using a sequence iterator can save you a lot of memory (RAM). There is less
benefit for interlaced file formats (e.g. most multiple alignment file formats).
However, an iterator only lets you access the records one by one.
If you want random access to the records by number, turn this into a list:
from Bio import SeqIO
handle = open("example.fasta", "rU")
records = list(SeqIO.parse(handle, "fasta"))
print records[0]
If you want random access to the records by a key such as the record id, turn
the iterator into a dictionary:
from Bio import SeqIO
handle = open("example.fasta", "rU")
record_dict = SeqIO.to_dict(SeqIO.parse(handle, "format"))
print record["gi:12345678"]
Input - Alignments
==================
Currently an alignment class cannot be created from SeqRecord objects.
Instead, use the to_alignment(...) function, like so:
from Bio import SeqIO
handle = open("example.aln", "rU")
alignment = SeqIO.to_alignment(SeqIO.parse(handle, "clustal"))
This function may be removed in future once alignments can be created
directly from SeqRecord objects.
Output
======
Use the function Bio.SeqIO.write(...), which takes a complete set of SeqRecord
objects (either as a list, or an iterator), an output file handle and of course
the file format.
from Bio import SeqIO
records = ...
handle = open("example.faa", "w")
SeqIO.write(records, handle, "fasta")
handle.close()
In general, you are expected to call this function once (with all your records)
and then close the file handle.
Output - Advanced
=================
The effect of calling write() multiple times on a single file will vary
depending on the file format, and is best avoided unless you have a strong reason
to do so.
Trying this for certain alignment formats (e.g. phylip, clustal, stockholm) would
have the effect of concatenating several multiple sequence alignments together.
Such files are created by the PHYLIP suite of programs for bootstrap analysis.
For sequential files formats (e.g. fasta, genbank) each "record block" holds a
single sequence. For these files it would probably be safe to call write()
multiple times.
File Formats
============
When specifying formats, use lowercase strings.
Old Files
=========
The modules Bio.SeqIO.FASTA and Bio.SeqIO.generic are depreciated and may be
removed.
|
write(sequences,
handle,
format)
Write complete set of sequences to a file |
source code
|
|
|
parse(handle,
format)
Turns a sequence file into a iterator returning SeqRecords |
source code
|
|
|
to_dict(sequences,
key_function=None)
Turns a sequence iterator or list into a dictionary |
source code
|
|
|
|
Write complete set of sequences to a file
sequences - A list (or iterator) of SeqRecord objects handle - File
handle object to write to format - What format to use.
You should close the handle after calling this function.
There is no return value.
|
Turns a sequence file into a iterator returning SeqRecords
handle - handle to the file. format - string describing the file
format.
If you have the file name in a string 'filename', use:
from Bio import SeqIO my_iterator =
SeqIO.parse(open(filename,"rU"), format)
If you have a string 'data' containing the file contents, use:
from Bio import SeqIO from StringIO import StringIO my_iterator =
SeqIO.parse(StringIO(data), format)
Note that file will be parsed with default settings, which may result
in a generic alphabet or other non-ideal settings. For more control, you
must use the format specific iterator directly...
|
Turns a sequence iterator or list into a dictionary
sequences - An iterator that returns SeqRecord objects,
or simply a list of SeqRecord objects.
key_function - Optional function which when given a SeqRecord
returns a unique string for the dictionary key.
e.g. key_function = lambda rec : rec.name
or, key_function = lambda rec : rec.description.split()[0]
If key_function is ommitted then record.id is used, on the
assumption that the records objects returned are SeqRecords
with a unique id field.
If there are duplicate keys, an error is raised.
Example usage:
from Bio import SeqIO
filename = "example.fasta"
d = SeqIO.to_dict(SeqIO.parse(open(faa_filename, "rU")),
key_function = lambda rec : rec.description.split()[0])
print len(d)
print d.keys()[0:10]
key = d.keys()[0]
print d[key]
|
to_alignment(sequences,
alphabet=Alphabet(),
strict=True)
| source code
|
Returns a multiple sequence alignment
sequences -An iterator that returns SeqRecord objects,
or simply a list of SeqRecord objects.
All the record sequences must be the same length.
alphabet - Optional alphabet. Stongly recommended.
strict - Optional, defaults to True. Should error checking
be done?
|
_FormatToIterator
- Value:
{"fasta": FastaIO.FastaIterator, "genbank": InsdcIO.GenBankIterator, "
genbank-cds": InsdcIO.GenBankCdsFeatureIterator, "embl": InsdcIO.EmblI
terator, "embl-cds": InsdcIO.EmblCdsFeatureIterator, "clustal": Clusta
lIO.ClustalIterator, "phylip": PhylipIO.PhylipIterator, "nexus": Nexus
IO.NexusIterator, "stockholm": StockholmIO.StockholmIterator, "swiss":
SwissIO.SwissIterator,}
|
|
_FormatToWriter
- Value:
{"fasta": FastaIO.FastaWriter, "phylip": PhylipIO.PhylipWriter, "stock
holm": StockholmIO.StockholmWriter, "clustal": ClustalIO.ClustalWriter
,}
|
|