[BiO BB] Re: Extracting location from a genbank flatfile

Peter Rice pmr at ebi.ac.uk
Thu Apr 10 05:50:04 EDT 2003


govind mk wrote:
> I am stuck with a rather simple problem.
> I would like to extract locations of specific features
> (Eg .CDS)from a Genbank flat file.
> 
> I tried using Bioperl but couldnt manage to get the 
> exact locations for complicated representations of
> locations such as
> complement(join(295405..295443,295492..295529))
> as Bioperl modules return the minimum start and
> maximum stop.

You can use EMBOSS (the European Molecular Biology Open Software Suite) 
http://www.uk.embnet.org/Software/EMBOSS/

EMBOSS is an open source (GPL/LGPL) package of sequence analysis 
libraries and programs.

Among other features, EMBOSS can read EMBL/Genbank, SwissProt and PIR 
feature tables and convert to/from GFF without losing information 
(although this does require adding some extra GFF tags to retain 
information about complex feature locations). The internals are similar 
to the ARTEMIS feature table editor from the Sanger Institute.

I am currently extending the feature table internals of EMBOSS for the 
next release, to allow deletion/insertion of sequence ranges, and would 
be interested in any feedback - especially things that are hard to do 
with existing tools.

regards,

Peter Rice
European Bioinformatics Institute.




More information about the BBB mailing list