[BiO BB] cDNA Library Subtraction - Bioinformatics

Fri Dec 12 23:22:43 EST 2003

Using the current state of the art bioinformatics tools/software, what is
the preferred method of *identifying EST sequences* for the subtraction
procedure of a cDNA library ?

In order to decrease the abundant messages which dominate cDNA libraries,
I hope to identify the longest, most abundant, and annotatable (based on
e.g. swissprot) ESTs.  I would like to get expert opinions on how to most
effectively go about it.  I have several thousand ESTs and would, for at
least this first round, like to identify 96 clones which are the most
abundant/longest/annotatable.

Approaches I have considered are :

1. Running the entire dataset through CAP3 to produce contigs.  Then take
the consensus sequence for each contig and run a blastp against swissprot
to see if is annotatable.
2.  Running an all against all blast search using the ESTs as both the
query and the database.  Additionally, one could make the database a
combination of both the ESTs and swissprot, thus indicating not only which
sequences have similar/identical matches within the EST database, but also
whether they have a homolog in swissprot

Does anything exist in bioperl which performs the necessary sequence
analysis for subtraction of a cDNA library?

BTW, if these are not the correct listserv/bulletin boards for such a
query, please let me know the preferred location.

Thank you and Happy Holidays!

Tristan Fiedler

-- 
Tristan J. Fiedler, Ph.D.
Postdoctoral Research Fellow - Walsh Laboratory
NIEHS Marine & Freshwater Biomedical Sciences Center
Rosenstiel School of Marine & Atmospheric Sciences
University of Miami

tfiedler at rsmas.miami.edu
t.fiedler at umiami.edu (alias)
305-361-4626