[BiO BB] RE: BiO_Bulletin_Board digest, Vol 1 #586 - 1 msg

Sucheta Tripathi tsucheta at hotmail.com
Mon Dec 15 06:29:39 EST 2003


I think there are several ways one can do ESt analysis if you have large 
number of sequences.

I am not sure if your sequences have been quality trimmed, and cleaned of 
vector, primer sequences.

The next step would be to clusetr them followed by assembly. There are lots 
of tools available, and many people develop their own, as we have with 
d2cluster and cap3. Alternatively you can obtain a copy of tgicl developed 
by TIGR. It is quite handy and does both clustering and assembly. The 
clustering is based on megablast and assembly on cap3.

For running a blast, you can get blast help from its ftp site, and can setup 
local blast and run blast for all the sequences at a time.

You can't run blast against swissprot and EST sequences at one go. since you 
can't format data for protein and nucleotide sequences together.

Hope this helps

<html><DIV><PRE><FONT face="Times New Roman" 
size=4>**********************************************<BR>Sucheta Tripathy, 
                        <BR>Senior Research 
                        <BR>Virginia Bioinformatics 
Institute,        <BR>Virginia 
Polytechnic & State Institute,   <BR>Virginia, USA, 
540-443-1763(H) </FONT></PRE></DIV></html>

>From: bio_bulletin_board-request at bioinformatics.org
>Reply-To: bio_bulletin_board at bioinformatics.org
>To: bio_bulletin_board at bioinformatics.org
>Subject: BiO_Bulletin_Board digest, Vol 1 #586 - 1 msg
>Date: Sun, 14 Dec 2003 12:01:10 -0500 (EST)
>When replying, PLEASE edit your Subject line so it is more specific
>than "Re: BiO_Bulletin_Board digest, Vol..."
>Today's Topics:
>    1. Re: cDNA Library Subtraction - Bioinformatics (Chris Dwan (CCGB))
>Message: 1
>Date: Sat, 13 Dec 2003 16:27:16 -0600 (CST)
>From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
>To: bio_bulletin_board at bioinformatics.org
>Cc: t.fiedler at umiami.edu
>Subject: Re: [BiO BB] cDNA Library Subtraction - Bioinformatics
>Reply-To: bio_bulletin_board at bioinformatics.org
> > Using the current state of the art bioinformatics tools/software, what 
> > the preferred method of *identifying EST sequences* for the subtraction
> > procedure of a cDNA library ?
>This is an interesting question to me, since the answer is so clearly a
>protocol combining a variety of existing tools rather than a single
>general tool.  BioPerl is an excellent framework for scripting such
> > In order to decrease the abundant messages which dominate cDNA 
> > I hope to identify the longest, most abundant, and annotatable (based on
> > e.g. swissprot) ESTs.  I would like to get expert opinions on how to 
> > effectively go about it.  I have several thousand ESTs and would, for at
> > least this first round, like to identify 96 clones which are the most
> > abundant/longest/annotatable.
>We have a pipeline to perform almost exactly the opposite analysis (find
>novel genes with no obvious homologs), some parts of which might be
>useful to you:
>1) Perform quality analysis on each EST
>    - Trim low quality reads from both ends until the sequence is
>      at most K percent ambiguous bases.  K varies depending on the
>      experiment.
>    - Look for the primer / linker site at each end of the sequence and
>      remove it if found.  Leaving these in makes for *great*
>      anchors for spurious assemblies of contigs.
>    - BLAST against E. Coli as well as popular phage sequences to look for
>      obvious contamination.
>    - BLAST against the human chromosomes to look for contamination
>2) Assemble contigs
>    - We use phrap, mostly because we have some expertise with it.
>      There are other options available.  My opinion is that it's better
>      to use a tool that is well understood at your lab than to try to
>      learn an unknown that may or may not be better.
>3) (optional) go through the contigs and break up those that do
>    not have good support across their entire length.  This is currently
>    a real pain, but hopefully we'll have an automated system trained
>    "real soon now."
>4) BLASTX contigs vs PIR-NREF (again, a local favorite).  Anything that
>    can be annotated this way, remove from further steps
>5) TBLASTX contigs vs. NCBI NT.
>6) Further manual analysis using HMMER and other tools.
> > Does anything exist in bioperl which performs the necessary sequence
> > analysis for subtraction of a cDNA library?
>All of these piece-parts can be scripted using BioPerl, but I'm not aware
>of any single general tool that does exactly what you're looking for.
>I am very interested to hear about how other shops do their EST analysis
>these days.
>-Chris Dwan
>  University of Minnesota
>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>End of BiO_Bulletin_Board Digest

Contact brides & grooms FREE! Only on www.shaadi.com. 
http://www.shaadi.com/ptnr.php?ptnr=hmltag Register now!

More information about the BBB mailing list