Class StockholmIterator
source code
Interfaces.AlignmentIterator --+
|
StockholmIterator
Loads a Stockholm file from PFAM into MultipleSeqAlignment
objects.
The file may contain multiple concatenated alignments, which are
loaded and returned incrementally.
This parser will detect if the Stockholm file follows the PFAM
conventions for sequence specific meta-data (lines starting #=GS and
#=GR) and populates the SeqRecord fields accordingly.
Any annotation which does not follow the PFAM conventions is currently
ignored.
If an accession is provided for an entry in the meta data, IT WILL NOT
be used as the record.id (it will be recorded in the record's
annotations). This is because some files have (sub) sequences from
different parts of the same accession (differentiated by different
start-end positions).
Wrap-around alignments are not supported - each sequences must be on a
single line. However, interlaced sequences should work.
For more information on the file format, please see:
http://www.bioperl.org/wiki/Stockholm_multiple_alignment_format
http://www.cgb.ki.se/cgb/groups/sonnhammer/Stockholm.html
For consistency with BioPerl and EMBOSS we call this the
"stockholm" format.
Return the next alignment in the file.
This method should be replaced by any derived class to do something
useful.
- Overrides:
Interfaces.AlignmentIterator.next
- (inherited documentation)
|
_get_meta_data(self,
identifier,
meta_dict)
| source code
|
Takes an itentifier and returns dict of all meta-data matching it.
For example, given "Q9PN73_CAMJE/149-220" will return all
matches to this or "Q9PN73_CAMJE" which the identifier without
its /start-end suffix.
In the example below, the suffix is required to match the AC, but must
be removed to match the OS and OC meta-data:
# STOCKHOLM 1.0
#=GS Q9PN73_CAMJE/149-220 AC Q9PN73
...
Q9PN73_CAMJE/149-220 NKA...
...
#=GS Q9PN73_CAMJE OS Campylobacter jejuni
#=GS Q9PN73_CAMJE OC Bacteria
This function will return an empty dictionary if no data is found.
|
_populate_meta_data(self,
identifier,
record)
| source code
|
Adds meta-date to a SecRecord's annotations dictionary.
This function applies the PFAM conventions.
|
pfam_gr_mapping
- Value:
{ ' AS ' : ' active_site ' ,
' IN ' : ' intron ' ,
' LI ' : ' ligand_binding ' ,
' PP ' : ' posterior_probability ' ,
' SA ' : ' surface_accessibility ' ,
' SS ' : ' secondary_structure ' ,
' TM ' : ' transmembrane ' }
|
|
pfam_gs_mapping
- Value:
{ ' LO ' : ' look ' , ' OC ' : ' organism_classification ' , ' OS ' : ' organism ' }
|
|