[BiO BB] Clustering EST sequences

Scott A. Halpine shalpine at ecomplexsystems.com
Tue Apr 1 15:13:37 EST 2003


Clustering EST sequencesI don't know of any conversion utilities but you can certainly write a quick conversion in Perl. I'm not familiar with the specific layouts but it sounds like you simply need to properly truncate each row of data. There shouldn't be a problem if your field partition is white space (or any other specific delimiter for that matter). 
If you don't get a better offer, send me a small data file of what you need converted, the field delimiter used, and an example of what it needs converted into. I should be able to write you a Perl routine and send it back to you. 
Scott A. Halpine
Ecologic Complex Systems, LLC
4640 Forbes Blvd, Suite 200
Lanham, MD 20706-4885
Phone: 301-918-3283
Fax: 301-429-8762

  ----- Original Message ----- 
  From: Bossers, A. 
  To: bio_bulletin_board at bioinformatics.org 
  Cc: biodevelopers at bioinformatics.org 
  Sent: Tuesday, April 01, 2003 6:29 AM
  Subject: [BiO BB] Clustering EST sequences


  Dear All, 

  I have a very basic problem of which I wonder how others have solved this. 

  I want to make a unigene collection of a large EST database. We have chromat files in ABI format and I use Linux on the intel platform.

  I have phred and phrap running but since phrap was originally designed for genomic sequences we get lots of misaasemblies on poly-A or poly-T stretches.

  Therefore I installed the TIGR tigcl package which is designed for EST databases and also runs very well on multi node machines.

  However, it uses multi fasta files (and corresponding (optional) quality files) as input. 
  I wanted to use the phred package to generate the required fasta and qual files. This runs fine but the fasta file has in the >name line additional info separated with spaces. These files are not accepted by TGICL.

  Is there an easy unix (linux) utility to convert these multi fasta files and quality fasta files in simpel >name {CRT} seq files so they kan be used as input for tgicl? Or is a conversion utility available to convert/extract phreds phd files into fasta-seq and fasta-qual?

  Any help would be appreciated, 

          Alex 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030401/ff415f75/attachment.html>


More information about the BBB mailing list