From golharam at umdnj.edu Thu Mar 2 01:01:40 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 02 Mar 2006 01:01:40 -0500 Subject: [BiO BB] Obtaining Genomic Sequence Message-ID: <031001c63dbe$c6fdf310$e6028a0a@GOLHARMOBILE1> If I have an accession # for a gene that I know occurs on a particular chromosome, how can I get the full genomic sequence for that gene containing all the exons and introns? Ryan From chirag_nepal at yahoo.com Wed Mar 1 22:11:16 2006 From: chirag_nepal at yahoo.com (chirag nepal) Date: Wed, 1 Mar 2006 19:11:16 -0800 (PST) Subject: [BiO BB] help Message-ID: <20060302031116.17099.qmail@web31614.mail.mud.yahoo.com> hi, i have been finding this error can anyone help me to sort out this one. Version 2.2.13 [Nov-27-2005] Started database file "ecoli.nt" NOTE: CoreLib [002.003] FileOpen("C:\Documents and Settings\chirag\FORMATDB.INI","r") failed NOTE: CoreLib [002.003] FileOpen("C:\WINDOWS\FORMATDB.INI","r") failed NOTE: [000.000] No number of link bits used found in config file. Ignoring NOTE: [000.000] No number of membership bits used found in config file. Ignoring Formatted 400 sequences in volume 0 looking forward ot hear from you. Thankyou for your help in advance. With regards Chirag Nepal M.Sc in Computer Science Bioinformatics Inha University 253 Younghyun-Dong Nam-gu Incheon 402-751 Korea --------------------------------- Brings words and photos together (easily) with PhotoMail - it's free and works with Yahoo! Mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kousik_bioinfo at yahoo.com Thu Mar 2 00:32:41 2006 From: kousik_bioinfo at yahoo.com (kousik kundu) Date: Wed, 1 Mar 2006 21:32:41 -0800 (PST) Subject: [BiO BB] help me as soon as possible Message-ID: <20060302053241.91420.qmail@web36903.mail.mud.yahoo.com> Hi! I'm into a short project (domain problem). To find out unique conserved region among conserved regions of genomes in evolution right from very primitive species to highly advanced species using internet & public databases as the working platform The results would include >DNA seq. of Unique conserved region & protein seq if available >SNPs identified in the unique conserved region What"ll be the likely approach to this problem with respect to the tools . Should i go for similarity search or identity? multiple or global alignment & which level of genome (transcriptome/mRNA) should i work with? I'll be highly thankful for ur guidance --------------------------------- Yahoo! Mail Use Photomail to share photos without annoying attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenxuanhungbk at yahoo.com Thu Mar 2 03:48:20 2006 From: nguyenxuanhungbk at yahoo.com (Nguyen Hung) Date: Thu, 2 Mar 2006 00:48:20 -0800 (PST) Subject: [BiO BB] Obtaining Genomic Sequence In-Reply-To: <031001c63dbe$c6fdf310$e6028a0a@GOLHARMOBILE1> Message-ID: <20060302084820.75912.qmail@web35412.mail.mud.yahoo.com> In this case you can use this link: http://www.ncbi.nlm.nih.gov and find the full genomic sequence by puting the accession number in search box Ryan Golhar wrote: If I have an accession # for a gene that I know occurs on a particular chromosome, how can I get the full genomic sequence for that gene containing all the exons and introns? Ryan _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board Nguyen Xuan Hung Institute of Biologycal and Food Technology HaNoi University of Technology No.1 Dai Co Viet VietNam --------------------------------- Yahoo! Mail Bring photos to life! New PhotoMail makes sharing a breeze. -------------- next part -------------- An HTML attachment was scrubbed... URL: From schlitt at ebi.ac.uk Thu Mar 2 04:54:06 2006 From: schlitt at ebi.ac.uk (Thomas Schlitt) Date: Thu, 2 Mar 2006 09:54:06 +0000 (GMT) Subject: [BiO BB] Obtaining Genomic Sequence In-Reply-To: <031001c63dbe$c6fdf310$e6028a0a@GOLHARMOBILE1> Message-ID: Dear Ryan did you have a look at Ensembl, http://www.ensembl.org? You can export genomic data in many different formats, you can start your search for example with a BLAST search - the ensembl blast pages allows you to enter an accession number ... If your genome isnt in Ensembl, have a look at www.biomart.org, this page provides links to many databases that implemented biomart to access their data - EBI GeneReviews offers a browser for a number of additional genomes that are not covered by ensembl. Good luck! Thomas On Thu, 2 Mar 2006, Ryan Golhar wrote: > If I have an accession # for a gene that I know occurs on a particular > chromosome, how can I get the full genomic sequence for that gene > containing all the exons and introns? > > > Ryan > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _____________________________________________________________ Thomas Schlitt, PhD British Antarctic Survey High Cross, Madingley Road Cambridge CB3 0ET, UK Tel. ++44-1223-221656 tsc at bas.ac.uk From er.sukhdeepsingh at gmail.com Thu Mar 2 07:44:20 2006 From: er.sukhdeepsingh at gmail.com (Sukhdeep Singh) Date: Thu, 2 Mar 2006 18:14:20 +0530 Subject: [BiO BB] (no subject) Message-ID: <40fbb41e0603020444s22a78ec7xbc036c1f72383ebb@mail.gmail.com> er.sukhdeepsingh at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sudhindra.Gadagkar at notes.udayton.edu Thu Mar 2 10:01:29 2006 From: Sudhindra.Gadagkar at notes.udayton.edu (Sudhindra.Gadagkar at notes.udayton.edu) Date: Thu, 2 Mar 2006 10:01:29 -0500 Subject: [BiO BB] Obtaining Genomic Sequence In-Reply-To: <031001c63dbe$c6fdf310$e6028a0a@GOLHARMOBILE1> Message-ID: Ryan, You may already know some of these steps, but let me list them anyway, with an example (accession number NM_000518 (which is a special kind of accession number called the RefSeq number - check out http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#AccessionB for definitions) for the mRNA of HBB, the beta hemoglobin gene of human) . 1. Enter your accession number in the NCBI homepage (http://www.ncbi.nlm.nih.gov/) and hit your "Enter" key. 2. In the next page (the Entrez page) click on the "Gene". 3. It will take you to a brief summary of the gene. 4. Click on the gene name (HBB in this case) 5. By default it takes you to a "Full Report" on the gene in the next page. 6. Change the "Full Report" (which is in a drop-down box on top) to "Gene Table". 7. And you have it - schematics, hyperlinked exons, introns and all. Hope this helps. Sudhindra ---------------------------------------------------------------------------- Sudhindra R. Gadagkar, Ph.D. Department of Biology University of Dayton 300 College Park Dayton, OH 45469-2320 Ph: (937) 229-2410 Fax: (937) 229-2021 Email: gadagkar at notes.udayton.edu ---------------------------------------------------------------------------- "Ryan Golhar" Sent by: bio_bulletin_board-bounces+sudhindra.gadagkar=notes.udayton.edu at bioinformatics.org 03/02/2006 01:01 AM Please respond to golharam at umdnj.edu; Please respond to "The general forum at Bioinformatics.Org" To "'The general forum at Bioinformatics.Org'" cc Subject [BiO BB] Obtaining Genomic Sequence If I have an accession # for a gene that I know occurs on a particular chromosome, how can I get the full genomic sequence for that gene containing all the exons and introns? Ryan _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -------------- next part -------------- An HTML attachment was scrubbed... URL: From golharam at umdnj.edu Thu Mar 2 13:45:28 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 02 Mar 2006 13:45:28 -0500 Subject: [BiO BB] Obtaining Genomic Sequence In-Reply-To: <031001c63dbe$c6fdf310$e6028a0a@GOLHARMOBILE1> Message-ID: <034a01c63e29$7a9a4670$e6028a0a@GOLHARMOBILE1> Thanks for all your responses, but maybe I need to clarify what I'm trying to do. I have a list of accession #'s for genes. I want to get the full-length genomic sequences (including exons and introns). I can get the chromosome and genomic coordinates using NCBI's eutils efetch method, however I'm not sure how to retrieve part of a sequence. I don't see information on NCBI's eutils documentation and was wondering if anyone here knew. In other words, say I have a gene that occurs on human chromosome 1. The accession # of chr1 is NC_000001, and I have the genomic coordinates, say 100 to 5000 on the positive strand. Without retrieving the entire chromosome, how can I retrieve just bases 100 to 5000? I can't do this by hand, as I have 1500+ genes, so I'm hoping to perl script it... -----Original Message----- From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org ] On Behalf Of Ryan Golhar Sent: Thursday, March 02, 2006 1:02 AM To: 'The general forum at Bioinformatics.Org' Subject: [BiO BB] Obtaining Genomic Sequence If I have an accession # for a gene that I know occurs on a particular chromosome, how can I get the full genomic sequence for that gene containing all the exons and introns? Ryan _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From hjm at tacgi.com Thu Mar 2 16:46:12 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 2 Mar 2006 13:46:12 -0800 Subject: [BiO BB] Obtaining Genomic Sequence In-Reply-To: <034a01c63e29$7a9a4670$e6028a0a@GOLHARMOBILE1> References: <034a01c63e29$7a9a4670$e6028a0a@GOLHARMOBILE1> Message-ID: <200603021346.12834.hjm@tacgi.com> There was a thread on this less than a month ago on the bioperl list. CHeck their archives around Feb 14th for a thread titled: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs There are example scripts and docs available for this kind of activity now via their wiki. hjm On Thursday 02 March 2006 10:45, Ryan Golhar wrote: > Thanks for all your responses, but maybe I need to clarify what I'm > trying to do. I have a list of accession #'s for genes. I want to get > the full-length genomic sequences (including exons and introns). > > I can get the chromosome and genomic coordinates using NCBI's eutils > efetch method, however I'm not sure how to retrieve part of a sequence. > I don't see information on NCBI's eutils documentation and was wondering > if anyone here knew. > > In other words, say I have a gene that occurs on human chromosome 1. > The accession # of chr1 is NC_000001, and I have the genomic > coordinates, say 100 to 5000 on the positive strand. Without retrieving > the entire chromosome, how can I retrieve just bases 100 to 5000? > > I can't do this by hand, as I have 1500+ genes, so I'm hoping to perl > script it... > > > > -----Original Message----- > From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > ] On Behalf Of Ryan Golhar > Sent: Thursday, March 02, 2006 1:02 AM > To: 'The general forum at Bioinformatics.Org' > Subject: [BiO BB] Obtaining Genomic Sequence > > > If I have an accession # for a gene that I know occurs on a particular > chromosome, how can I get the full genomic sequence for that gene > containing all the exons and introns? > > > Ryan > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From forward at hongyu.org Thu Mar 2 21:33:21 2006 From: forward at hongyu.org (Hongyu Zhang) Date: Thu, 2 Mar 2006 18:33:21 -0800 (PST) Subject: [BiO BB] API to NCBI Message-ID: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> Does anyone have a handy script to retrieve the peptide sequence translation for any given NCBI mRNA GI number? Thanks! -- Hongyu Zhang, Ph.D. Computational Biologist Ceres Inc. 1535 Rancho Conejo Blvd Thousand Oaks, CA 91320 Phone: (805)376-6504 ext 1204 From christoph.gille at charite.de Fri Mar 3 06:07:08 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 3 Mar 2006 12:07:08 +0100 (CET) Subject: [BiO BB] API to NCBI In-Reply-To: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> References: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> Message-ID: <44304.141.42.56.114.1141384028.squirrel@webmail.charite.de> If you have a text list containing sequence IDs you can use STRAP http://www.charite.de/bioinf/strap/ Go to menu "file" Use the SRS dialog and paste the text file into the large text-field. highlight NCBI type IDs and press the NCBI fetch button. voila. From ulimard at yahoo.com.br Fri Mar 3 15:05:13 2006 From: ulimard at yahoo.com.br (Ulisses Dias) Date: Fri, 3 Mar 2006 17:05:13 -0300 (ART) Subject: [BiO BB] Function of a Protein Message-ID: <20060303200513.10076.qmail@web50510.mail.yahoo.com> Hi all, Someone knows a software or paper that study how to make a prediction of the functions of a protein given its molecular structure?? Thank's for any help Ulisses dIAs --------------------------------- Yahoo! Acesso Gr?tis Internet r?pida e gr?tis. Instale o discador agora! -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin_jambon at emailuser.net Fri Mar 3 15:26:22 2006 From: martin_jambon at emailuser.net (Martin Jambon) Date: Fri, 3 Mar 2006 12:26:22 -0800 (PST) Subject: [BiO BB] Function of a Protein In-Reply-To: <20060303200513.10076.qmail@web50510.mail.yahoo.com> References: <20060303200513.10076.qmail@web50510.mail.yahoo.com> Message-ID: On Fri, 3 Mar 2006, Ulisses Dias wrote: > Hi all, > > Someone knows a software or paper that study how to make a prediction of the functions of a protein given its molecular structure?? See these pages at Wikiomics (bioinformatics wiki): http://wikiomics.org/wiki/Protein_function_prediction http://wikiomics.org/wiki/Searching_for_3D_functional_sites_in_a_protein_structure As usual, this is a wiki, so everyone is welcome to enhance it. Cheers, Martin -- Martin Jambon, PhD http://martin.jambon.free.fr Visit http://wikiomics.org, the Bioinformatics Howto Wiki From idoerg at burnham.org Fri Mar 3 15:54:21 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Fri, 03 Mar 2006 12:54:21 -0800 Subject: [BiO BB] Function of a Protein In-Reply-To: <20060303200513.10076.qmail@web50510.mail.yahoo.com> References: <20060303200513.10076.qmail@web50510.mail.yahoo.com> Message-ID: <4408ACFD.5060805@burnham.org> A table of some methods. Prediction of functional sites, and of functions: http://martin.jambon.free.fr/search-protein-3D-sites.html A review on function prediction, a section structure to function included: http://iddo-friedberg.org/afp_postref_ce_forpub_mjb250206-1.pdf There are chapters on this subject in the following books: Structural Bioinformatics (Methods of Biochemical Analysis) (Hardcover) by Philip E. Bourne (Editor), Helge Weissig (Editor) http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12647396&query_hl=9&itool=pubmed_docsum The Ten Most Wanted Solutions in Protein Bioinformatics (Chapman & Hall/Crc Mathematical Biology and Medicine) (Hardcover) by Anna Tramontano http://www.amazon.com/gp/product/1584884916/qid=1141419179/sr=1-2/ref=sr_1_2/102-8189745-1844955?s=books&v=glance&n=283155 HTH, Iddo Ulisses Dias wrote: > Hi all, > > Someone knows a software or paper that study how to make a > prediction of the functions of a protein given its molecular structure?? > > Thank's for any help > Ulisses dIAs > > ------------------------------------------------------------------------ > Yahoo! Acesso Gr?tis > Internet r?pida e gr?tis. Instale o discador agora! > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://iddo-friedberg.org http://BioFunctionPrediction.org From CHRISTOPHER_FRENZ at NYMC.EDU Sat Mar 4 11:50:39 2006 From: CHRISTOPHER_FRENZ at NYMC.EDU (Frenz, Christopher) Date: Sat, 4 Mar 2006 11:50:39 -0500 Subject: [BiO BB] Function of a Protein References: <20060303200513.10076.qmail@web50510.mail.yahoo.com> Message-ID: <70C50B8807B54A429AC206E83A3BA6BCAD8C6E@mail.nymc.edu> It is not structure based, but a useful tool for predicting protein functions is SMART (http://smart.embl-heidelberg.de/). Chris -----Original Message----- From: bio_bulletin_board-bounces+christopher_frenz=nymc.edu at bioinformatics.org on behalf of Ulisses Dias Sent: Fri 3/3/2006 3:05 PM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] Function of a Protein Hi all, Someone knows a software or paper that study how to make a prediction of the functions of a protein given its molecular structure?? Thank's for any help Ulisses dIAs --------------------------------- Yahoo! Acesso Gr?tis Internet r?pida e gr?tis. Instale o discador agora! -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3067 bytes Desc: not available URL: From golharam at umdnj.edu Mon Mar 6 00:14:38 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 06 Mar 2006 00:14:38 -0500 Subject: [BiO BB] API to NCBI In-Reply-To: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> Message-ID: <008601c640dc$dea20260$2f01a8c0@GOLHARMOBILE1> If you have the mRNA from start to stop codon, you can use the transeq from EMBOSS to translate the sequence for you. It's a very straight-forward tool. I use it in a number of scripts to do exactly this. -----Original Message----- From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org ] On Behalf Of Hongyu Zhang Sent: Thursday, March 02, 2006 9:33 PM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] API to NCBI Does anyone have a handy script to retrieve the peptide sequence translation for any given NCBI mRNA GI number? Thanks! -- Hongyu Zhang, Ph.D. Computational Biologist Ceres Inc. 1535 Rancho Conejo Blvd Thousand Oaks, CA 91320 Phone: (805)376-6504 ext 1204 _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From nagesh.chakka at anu.edu.au Mon Mar 6 00:23:52 2006 From: nagesh.chakka at anu.edu.au (Nagesh Chakka) Date: Mon, 6 Mar 2006 16:23:52 +1100 (EST) Subject: [BiO BB] Advice on using Modeller In-Reply-To: <008601c640dc$dea20260$2f01a8c0@GOLHARMOBILE1> References: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> <008601c640dc$dea20260$2f01a8c0@GOLHARMOBILE1> Message-ID: <1895.59.93.66.75.1141622632.squirrel@sqmail.anu.edu.au> Hi, I am trying to use Modeller to obtain a model using the structure 1XU0. I have created the three files needed for Modeller Atom file http://www.rcsb.org/pdb/navbarsearch.do?newSearch=yes&isAuthorSearch=no&radioset=All&inputQuickSearch=1xu0&image.x=0&image.y=0&image=Search Alignment file >P1;1XU0 structureX:1XU0:125:A:226:A:::: IGGYMLGNAVGRMSYQFNNPMESRYYNDYYNQMPNRVYRPMYRGEEYVSEDRFVRDCYNM SVTEYIIKPAEGKNNSELNQLDTTVKSQIIREMCITEYRRGS * >P1;XlPrP2 sequence:XlPrP2:1::134::::: IGGYMLGNAVGRMNHHFDNPMESRYYNDYYNQMPDRVYRPMYRSEEYVSEDRFVTDCYNM SVTEYIIKPSEGKNGSDVNQLDTVVKSKIIREMCITEYRRGS Top file INCLUDE SET OUTPUT_CONTROL = 1 1 1 1 1 SET ALNFILE = 'inputAlignment_files/prp2StructAli.txt' SET KNOWNS = '1XU0' SET SEQUENCE = 'XlPrP2' SET ATOM_FILES_DIRECTORY = '/home/nagesh/modelling/structure' SET STARTING_MODEL = 1 SET ENDING_MODEL = 4 CALL ROUTINE = 'model' In doing so, the final model XlPrP2.B99990004 does not contain all the atoms for a particular residue. For example, the following is the part of the model for one amino acid ATOM 1 N ILE 1 14.910 4.779 7.839 1.00 41.08 1SG 2 ATOM 2 CA ILE 1 13.848 4.061 8.581 1.00 41.08 1SG 3 ATOM 3 CB ILE 1 12.918 3.371 7.630 1.00 41.08 1SG 4 ATOM 4 CG2 ILE 1 11.924 2.529 8.447 1.00 41.08 1SG 5 ATOM 5 CG1 ILE 1 12.242 4.416 6.730 1.00 41.08 1SG 6 ATOM 6 CD1 ILE 1 11.473 3.813 5.557 1.00 41.08 1SG 7 ATOM 7 C ILE 1 14.422 3.050 9.510 1.00 41.08 1SG 8 ATOM 8 O ILE 1 15.261 2.235 9.128 1.00 41.08 1SG 9 Atoms from the structure file ATOM 1 N ILE A 125 15.197 -5.772 -0.208 1.00 0.00 N ATOM 2 CA ILE A 125 14.432 -5.479 0.993 1.00 0.00 C ATOM 3 C ILE A 125 14.890 -6.453 2.086 1.00 0.00 C ATOM 4 O ILE A 125 14.874 -7.667 1.884 1.00 0.00 O ATOM 5 CB ILE A 125 12.922 -5.586 0.661 1.00 0.00 C ATOM 6 CG1 ILE A 125 12.526 -4.523 -0.390 1.00 0.00 C ATOM 7 CG2 ILE A 125 12.073 -5.416 1.930 1.00 0.00 C ATOM 8 CD1 ILE A 125 11.131 -4.703 -1.000 1.00 0.00 C ATOM 9 H ILE A 125 14.783 -6.386 -0.892 1.00 0.00 H ATOM 10 HA ILE A 125 14.656 -4.459 1.305 1.00 0.00 H ATOM 11 HB ILE A 125 12.723 -6.575 0.246 1.00 0.00 H ATOM 12 1HG1 ILE A 125 12.591 -3.543 0.075 1.00 0.00 H ATOM 13 2HG1 ILE A 125 13.228 -4.540 -1.223 1.00 0.00 H ATOM 14 1HG2 ILE A 125 12.338 -4.494 2.447 1.00 0.00 H ATOM 15 2HG2 ILE A 125 11.014 -5.400 1.691 1.00 0.00 H ATOM 16 3HG2 ILE A 125 12.237 -6.268 2.586 1.00 0.00 H ATOM 17 1HD1 ILE A 125 10.989 -3.972 -1.797 1.00 0.00 H ATOM 18 2HD1 ILE A 125 11.032 -5.707 -1.414 1.00 0.00 H ATOM 19 3HD1 ILE A 125 10.362 -4.542 -0.250 1.00 0.00 H What is the reason for the loss of the atoms and how can I fix this. Thank you very much in advance Nagesh From yvan.strahm at gmail.com Mon Mar 6 00:29:31 2006 From: yvan.strahm at gmail.com (Yvan) Date: Sun, 05 Mar 2006 21:29:31 -0800 Subject: [BiO BB] Advice on using Modeller In-Reply-To: <1895.59.93.66.75.1141622632.squirrel@sqmail.anu.edu.au> References: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> <008601c640dc$dea20260$2f01a8c0@GOLHARMOBILE1> <1895.59.93.66.75.1141622632.squirrel@sqmail.anu.edu.au> Message-ID: <440BC8BB.2080606@gmail.com> Hello Nagesh , You should direct your modeller questions to this list: modeller_usage at salilab.org Cheers yvan From narcis at fiserlab.org Mon Mar 6 08:05:16 2006 From: narcis at fiserlab.org (Narcis Fernandez-Fuentes) Date: Mon, 06 Mar 2006 08:05:16 -0500 Subject: [BiO BB] Advice on using Modeller In-Reply-To: <440BC8BB.2080606@gmail.com> References: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> <008601c640dc$dea20260$2f01a8c0@GOLHARMOBILE1> <1895.59.93.66.75.1141622632.squirrel@sqmail.anu.edu.au> <440BC8BB.2080606@gmail.com> Message-ID: <440C338C.1010609@fiserlab.org> using modeller you only will get a model with all non-hidrogen atoms. Yvan wrote: > Hello Nagesh > , > You should direct your modeller questions to this list: > > modeller_usage at salilab.org > > Cheers > > yvan > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Narcis Fernandez-Fuentes, phD Seaver Center for Bioinformatics Albert Einstein College of Medicine 1300 Morris Park Ave, Bronx, NY 10461, USA phone: (718)430-3233 fax: (718) 430-8565 mailto:narcis at fiserlab.org (http://www.fiserlab.org) From nagesh.chakka at anu.edu.au Mon Mar 6 08:35:44 2006 From: nagesh.chakka at anu.edu.au (Nagesh Chakka) Date: Tue, 7 Mar 2006 00:35:44 +1100 (EST) Subject: [BiO BB] Advice on using Modeller In-Reply-To: <440C338C.1010609@fiserlab.org> References: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> <008601c640dc$dea20260$2f01a8c0@GOLHARMOBILE1> <1895.59.93.66.75.1141622632.squirrel@sqmail.anu.edu.au> <440BC8BB.2080606@gmail.com> <440C338C.1010609@fiserlab.org> Message-ID: <1227.59.93.68.149.1141652144.squirrel@sqmail.anu.edu.au> Hi Narcis, Thanks for the information. I should accept that I was not aware of this (just re-read the documentation and found it). Is there any other approach which addresses this issue. Thanks Nagesh > using modeller you only will get a model with all > non-hidrogen atoms. > > Yvan wrote: >> Hello Nagesh >> , >> You should direct your modeller questions to this list: >> >> modeller_usage at salilab.org >> >> Cheers >> >> yvan >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > -- > Narcis Fernandez-Fuentes, phD > Seaver Center for Bioinformatics > Albert Einstein College of Medicine > 1300 Morris Park Ave, Bronx, NY 10461, USA > phone: (718)430-3233 fax: (718) 430-8565 > mailto:narcis at fiserlab.org (http://www.fiserlab.org) > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From boris.steipe at utoronto.ca Mon Mar 6 09:05:39 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Mon, 6 Mar 2006 09:05:39 -0500 Subject: [BiO BB] Advice on using Modeller In-Reply-To: <1227.59.93.68.149.1141652144.squirrel@sqmail.anu.edu.au> References: <33136.67.17.255.178.1141353201.squirrel@hongyu.org> <008601c640dc$dea20260$2f01a8c0@GOLHARMOBILE1> <1895.59.93.66.75.1141622632.squirrel@sqmail.anu.edu.au> <440BC8BB.2080606@gmail.com> <440C338C.1010609@fiserlab.org> <1227.59.93.68.149.1141652144.squirrel@sqmail.anu.edu.au> Message-ID: <7B4F94F2-7C71-499C-88CA-7351E3B9F927@utoronto.ca> Crystallographers have written many programs that will attempt to guess proton coordinates (eg. HGEN in the CCP4 suite). Web servers exist too, for example http://swift.cmbi.kun.nl/WIWWWI/ However the combination of limited resolution and coordinate errors in the template, model coordinate uncertainty, and H placement uncertainty, will make the detailed interpretation of hydrogen atom geometries untenable. HTH B. On 6 Mar 2006, at 08:35, Nagesh Chakka wrote: > Hi Narcis, > Thanks for the information. I should accept that I was not aware of > this > (just re-read the documentation and found it). Is there any other > approach > which addresses this issue. > Thanks > Nagesh > >> using modeller you only will get a model with all >> non-hidrogen atoms. >> >> Yvan wrote: >>> Hello Nagesh >>> , >>> You should direct your modeller questions to this list: >>> >>> modeller_usage at salilab.org >>> >>> Cheers >>> >>> yvan >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >> >> -- >> Narcis Fernandez-Fuentes, phD >> Seaver Center for Bioinformatics >> Albert Einstein College of Medicine >> 1300 Morris Park Ave, Bronx, NY 10461, USA >> phone: (718)430-3233 fax: (718) 430-8565 >> mailto:narcis at fiserlab.org (http://www.fiserlab.org) >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From bioinfosm at gmail.com Sat Mar 4 17:58:23 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Sat, 4 Mar 2006 17:58:23 -0500 Subject: [BiO BB] Updating NCBI databases In-Reply-To: <003101c62d33$0ac480b0$e6028a0a@GOLHARMOBILE1> References: <726450810602070916g2f525b20rae9b09f7183e4209@mail.gmail.com> <003101c62d33$0ac480b0$e6028a0a@GOLHARMOBILE1> Message-ID: <726450810603041458n3ceaf29ycd0407be4f6e4ee6@mail.gmail.com> Ryan, Thanks for the note ! Yeah, I cant think of other options for now .. but will let you know if I figure something. ~S On 2/8/06, Ryan Golhar wrote: > > I think the easiest way is to do a 'ps -ef | grep blast'. If it returns a > process (other than itself) in the list, then you know someone is running > blast. > > The other (simpler) option is to maintain a seperate BLAST database that > you can copy in place under a standard maintainence window. We normally > reboot our server once a month when system patches are applied. During this > outage period, the new database could be copied in place. > > Let me know what you end up doing. I'm curious to the solution you decide > on. I might want to use that here. > > Ryan > > > -----Original Message----- > *From:* bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org[mailto: > bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org] *On > Behalf Of *Samantha Fox > *Sent:* Tuesday, February 07, 2006 12:16 PM > *To:* The general forum at Bioinformatics.Org > *Subject:* [BiO BB] Updating NCBI databases > > Hi, > I had a question regarding regular update of BLAST databases. Is there a > standard way to move the updated databases to the user section, making sure > that the current copy is not already in use ? > > Suppose I update the database monthly, but a user might have a big BLAST > job running when my cron script starts the update. This can lead to errors. > How can I prevent that ? > > Thanks, > ~S > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenxuanhungbk at yahoo.com Mon Mar 6 00:58:50 2006 From: nguyenxuanhungbk at yahoo.com (Nguyen Hung) Date: Sun, 5 Mar 2006 21:58:50 -0800 (PST) Subject: [BiO BB] Advice on using Modeller In-Reply-To: <1895.59.93.66.75.1141622632.squirrel@sqmail.anu.edu.au> Message-ID: <20060306055850.73897.qmail@web35403.mail.mud.yahoo.com> Dear Nagesh You can do it again as well as use the other sequences to check this problemb. Good luck, Nagesh Chakka wrote: Hi, I am trying to use Modeller to obtain a model using the structure 1XU0. I have created the three files needed for Modeller Atom file http://www.rcsb.org/pdb/navbarsearch.do?newSearch=yes&isAuthorSearch=no&radioset=All&inputQuickSearch=1xu0&image.x=0&image.y=0&image=Search Alignment file >P1;1XU0 structureX:1XU0:125:A:226:A:::: IGGYMLGNAVGRMSYQFNNPMESRYYNDYYNQMPNRVYRPMYRGEEYVSEDRFVRDCYNM SVTEYIIKPAEGKNNSELNQLDTTVKSQIIREMCITEYRRGS * >P1;XlPrP2 sequence:XlPrP2:1::134::::: IGGYMLGNAVGRMNHHFDNPMESRYYNDYYNQMPDRVYRPMYRSEEYVSEDRFVTDCYNM SVTEYIIKPSEGKNGSDVNQLDTVVKSKIIREMCITEYRRGS Top file INCLUDE SET OUTPUT_CONTROL = 1 1 1 1 1 SET ALNFILE = 'inputAlignment_files/prp2StructAli.txt' SET KNOWNS = '1XU0' SET SEQUENCE = 'XlPrP2' SET ATOM_FILES_DIRECTORY = '/home/nagesh/modelling/structure' SET STARTING_MODEL = 1 SET ENDING_MODEL = 4 CALL ROUTINE = 'model' In doing so, the final model XlPrP2.B99990004 does not contain all the atoms for a particular residue. For example, the following is the part of the model for one amino acid ATOM 1 N ILE 1 14.910 4.779 7.839 1.00 41.08 1SG 2 ATOM 2 CA ILE 1 13.848 4.061 8.581 1.00 41.08 1SG 3 ATOM 3 CB ILE 1 12.918 3.371 7.630 1.00 41.08 1SG 4 ATOM 4 CG2 ILE 1 11.924 2.529 8.447 1.00 41.08 1SG 5 ATOM 5 CG1 ILE 1 12.242 4.416 6.730 1.00 41.08 1SG 6 ATOM 6 CD1 ILE 1 11.473 3.813 5.557 1.00 41.08 1SG 7 ATOM 7 C ILE 1 14.422 3.050 9.510 1.00 41.08 1SG 8 ATOM 8 O ILE 1 15.261 2.235 9.128 1.00 41.08 1SG 9 Atoms from the structure file ATOM 1 N ILE A 125 15.197 -5.772 -0.208 1.00 0.00 N ATOM 2 CA ILE A 125 14.432 -5.479 0.993 1.00 0.00 C ATOM 3 C ILE A 125 14.890 -6.453 2.086 1.00 0.00 C ATOM 4 O ILE A 125 14.874 -7.667 1.884 1.00 0.00 O ATOM 5 CB ILE A 125 12.922 -5.586 0.661 1.00 0.00 C ATOM 6 CG1 ILE A 125 12.526 -4.523 -0.390 1.00 0.00 C ATOM 7 CG2 ILE A 125 12.073 -5.416 1.930 1.00 0.00 C ATOM 8 CD1 ILE A 125 11.131 -4.703 -1.000 1.00 0.00 C ATOM 9 H ILE A 125 14.783 -6.386 -0.892 1.00 0.00 H ATOM 10 HA ILE A 125 14.656 -4.459 1.305 1.00 0.00 H ATOM 11 HB ILE A 125 12.723 -6.575 0.246 1.00 0.00 H ATOM 12 1HG1 ILE A 125 12.591 -3.543 0.075 1.00 0.00 H ATOM 13 2HG1 ILE A 125 13.228 -4.540 -1.223 1.00 0.00 H ATOM 14 1HG2 ILE A 125 12.338 -4.494 2.447 1.00 0.00 H ATOM 15 2HG2 ILE A 125 11.014 -5.400 1.691 1.00 0.00 H ATOM 16 3HG2 ILE A 125 12.237 -6.268 2.586 1.00 0.00 H ATOM 17 1HD1 ILE A 125 10.989 -3.972 -1.797 1.00 0.00 H ATOM 18 2HD1 ILE A 125 11.032 -5.707 -1.414 1.00 0.00 H ATOM 19 3HD1 ILE A 125 10.362 -4.542 -0.250 1.00 0.00 H What is the reason for the loss of the atoms and how can I fix this. Thank you very much in advance Nagesh _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board Nguyen Xuan Hung Institute of Biologycal and Food Technology HaNoi University of Technology No.1 Dai Co Viet VietNam --------------------------------- Relax. Yahoo! Mail virus scanning helps detect nasty viruses! -------------- next part -------------- An HTML attachment was scrubbed... URL: From zfu at cs.ucr.edu Wed Mar 8 15:09:36 2006 From: zfu at cs.ucr.edu (Zheng Fu) Date: Wed, 8 Mar 2006 12:09:36 -0800 (PST) Subject: [BiO BB] Where can I get the matching information between RefSeq mRNA and RefSeq Protein? In-Reply-To: <20050817153937.20319.qmail@web53505.mail.yahoo.com> Message-ID: Most of RefSeq mRNA have accession No. starting with "NM_", while RefSeq Proteins' accession No. starting with "NP_". How can I get a correspondence information between "NM_" and its product "NP_" for all the genes of H. Sapiens and M. Musculus? Thanks. From basu at pharm.sunysb.edu Wed Mar 8 14:09:55 2006 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Wed, 08 Mar 2006 14:09:55 -0500 Subject: [BiO BB] Where can I get the matching information between RefSeq mRNA and RefSeq Protein? In-Reply-To: References: Message-ID: <440F2C03.2080105@pharm.sunysb.edu> Zheng Fu wrote: > Most of RefSeq mRNA have accession No. starting with "NM_", while RefSeq > Proteins' accession No. starting with "NP_". How can I get a > correspondence information between "NM_" and its product "NP_" for all the > genes of H. Sapiens and M. Musculus? Thanks. > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board Download this file from here: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz Extract columns 4 and 6 for NP_ and NM_ correspondence. Use column 1 for filtering out your desired species. -sidd From zfu at cs.ucr.edu Wed Mar 8 16:17:19 2006 From: zfu at cs.ucr.edu (Zheng Fu) Date: Wed, 8 Mar 2006 13:17:19 -0800 (PST) Subject: [BiO BB] Where can I get the matching information between RefSeq mRNA and RefSeq Protein? In-Reply-To: <440F2C03.2080105@pharm.sunysb.edu> Message-ID: Thank you very much. On Wed, 8 Mar 2006, Siddhartha Basu wrote: > Zheng Fu wrote: > > Most of RefSeq mRNA have accession No. starting with "NM_", while RefSeq > > Proteins' accession No. starting with "NP_". How can I get a > > correspondence information between "NM_" and its product "NP_" for all the > > genes of H. Sapiens and M. Musculus? Thanks. > > > > _______________________________________________ > > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > Download this file from here: > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz > > Extract columns 4 and 6 for NP_ and NM_ correspondence. Use column 1 for > filtering out your desired species. > > -sidd > > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From smanjari at gmail.com Thu Mar 9 01:55:05 2006 From: smanjari at gmail.com (manjari g) Date: Thu, 9 Mar 2006 12:25:05 +0530 Subject: [BiO BB] sgi altix blastpgp Message-ID: I am research student at Bioinformatics Centre, University of Pune, Pune (INDIA). We have 16 processor SGI-Altix in the department. I have downloaded HTC-Blast propack 3 from SGI website. Normal blast works fine. however for blastpgp (psi-blast) i want to store PSSM files using -Q option. I am unable to create separate outfiles in the manner it does using out_dir option. Please help me. With regards Sunitha Manjari Bioinformatics Centre Unviersity of Pune Pune -INDIA From marty.gollery at gmail.com Thu Mar 9 13:29:54 2006 From: marty.gollery at gmail.com (Martin Gollery) Date: Thu, 9 Mar 2006 10:29:54 -0800 Subject: [BiO BB] sgi altix blastpgp In-Reply-To: References: Message-ID: I don't know what the trouble is, but why are you using -Q anyway? If you want to use the PSSM for anything, like an RPS-BLAST database, for example, you should use the -C option instead. Marty On 3/8/06, manjari g wrote: > > I am research student at Bioinformatics Centre, University of Pune, > Pune (INDIA). We have 16 processor SGI-Altix in the department. I have > downloaded HTC-Blast propack 3 from SGI website. Normal blast works > fine. however for blastpgp (psi-blast) i want to store PSSM files > using -Q option. I am unable to create separate outfiles in the manner > it does using out_dir option. Please help me. > With regards > Sunitha Manjari > Bioinformatics Centre > Unviersity of Pune > Pune -INDIA > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From delete at elfdata.com Sat Mar 11 09:35:21 2006 From: delete at elfdata.com (Theodore H. Smith) Date: Sat, 11 Mar 2006 14:35:21 +0000 Subject: [BiO BB] Testing a smith-waterman algorithm? Message-ID: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> Hi people, I've successfully designed, written and compiled a program that uses the smith-waterman algorithm. Nothing new there, but it's for an interesting project, and before the project is complete, perhaps some questions asked to bioinformaticians can help bring me up to your level. The next stage after compiling, is testing my algorithm. I now must write some tests for my code. This is where I am seeing that I'm unsure if I even understand Smith- Waterman properly! I understand Levenshtein OK (similar to Needleman- Wunsch), but Smith-Waterman I'm a bit unclear on. Mostly I'm wondering exactly how does local matching help us, over global matching. I got a lay person's description of why it helps, but I'm more interested in getting an exact feel for it. Does it make sense to use English words as an example here, instead of protein sequences? That would help me understand this a bit better, as I have a better feel for English than proteins (unlike many of you). Would then the main advantage be, for searching for short sequences within long ones, without being unfairly penalised by the non- matching ends of the long sequence? For example: "extrapolate" could match "extra", far better in Smith- Waterman than it could using Levenshtein, because we aren't being penalised so badly by the "polate" part. Or perhaps: "specialisation" would match "lisation" far better using local than global, because we aren't being penalised by the "specia" part so much. Or even: "disestablishmentarianism" would match "establishment" far better using local than global, because we aren't being penalised by "dis" or "arianism". Is that how local searches like Smith-Waterman benefit us? What about when we are searching for two long sequences of which only a small part will match? Let's say "disestabishmentarianism" against "reestablishmentSomeNonMatchingPart". A local alignment should be able to figure out that "establishment" aligns well in this case. Is that basically how Smith-Waterman helps us? -- http://elfdata.com/plugin/ From ykalidas at gmail.com Sat Mar 11 11:32:29 2006 From: ykalidas at gmail.com (Kalidas Yeturu) Date: Sat, 11 Mar 2006 22:02:29 +0530 Subject: [BiO BB] Testing a smith-waterman algorithm? In-Reply-To: <5632703b0603110827v10bf787fh5213101d8452bb7a@mail.gmail.com> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> <5632703b0603110827v10bf787fh5213101d8452bb7a@mail.gmail.com> Message-ID: <5632703b0603110832u36d979bwdb17a373b9b4cce0@mail.gmail.com> Hi If the task is to set up a query server for finding all common substrings of length greater than a given threshold, between very very long query sequences (eg., finding long gene in a very long nucleotide sequence) better idea would be suffix trees instead of SW-dynamic programming on each query call. If the sequences are comparitively small and interest is in analyzing conserved patterns, then SW- helps a lot. With Regards Kalidas. Y On 3/11/06, Theodore H. Smith wrote: > > > Hi people, > > I've successfully designed, written and compiled a program that uses > the smith-waterman algorithm. > > Nothing new there, but it's for an interesting project, and before > the project is complete, perhaps some questions asked to > > Let's say "disestabishmentarianism" against > "reestablishmentSomeNonMatchingPart". > > A local alignment should be able to figure out that "establishment" > aligns well in this case. > > Is that basically how Smith-Waterman helps us? > > -- > http://elfdata.com/plugin/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From delete at elfdata.com Sat Mar 11 14:20:21 2006 From: delete at elfdata.com (Theodore H. Smith) Date: Sat, 11 Mar 2006 19:20:21 +0000 Subject: [BiO BB] Testing a smith-waterman algorithm? In-Reply-To: <5632703b0603110832u36d979bwdb17a373b9b4cce0@mail.gmail.com> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> <5632703b0603110827v10bf787fh5213101d8452bb7a@mail.gmail.com> <5632703b0603110832u36d979bwdb17a373b9b4cce0@mail.gmail.com> Message-ID: <2863D474-C5A6-492D-A1A6-67391D931250@elfdata.com> A suffix tree is not suite to doing the kind of protein matching I will be doing, though. I'm not doing English, I'm doing proteins. I just used English examples because it's easier to understand. On 11 Mar 2006, at 16:32, Kalidas Yeturu wrote: > Hi > If the task is to set up a query server for finding all common > substrings of length greater than a given threshold, between very > very long query sequences (eg., finding long gene in a very long > nucleotide sequence) better idea would be suffix trees instead of > SW-dynamic programming on each query call. If the sequences are > comparitively small and interest is in analyzing conserved > patterns, then SW- helps a lot. > > With Regards > Kalidas. Y > > > On 3/11/06, Theodore H. Smith < delete at elfdata.com> wrote: > Hi people, > > I've successfully designed, written and compiled a program that uses > the smith-waterman algorithm. > > Nothing new there, but it's for an interesting project, and before > the project is complete, perhaps some questions asked to > > Let's say "disestabishmentarianism" against > "reestablishmentSomeNonMatchingPart". > > A local alignment should be able to figure out that "establishment" > aligns well in this case. > > Is that basically how Smith-Waterman helps us? > > -- > http://elfdata.com/plugin/ > > -- http://elfdata.com/plugin/ From boris.steipe at utoronto.ca Sat Mar 11 14:26:37 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Sat, 11 Mar 2006 14:26:37 -0500 Subject: [BiO BB] Testing a smith-waterman algorithm? In-Reply-To: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> Message-ID: <71E4CF59-E980-46A2-97D6-4335BF2C0E3E@utoronto.ca> Cute example. I've changed it slightly to illustrate the main point using the strings (not words!) disestabishmentarianism (quoting you, with the deleted "l") reestablishedfederalagressivism Then I've run them through "needle" and "water" of EMBOSS. (Google for "EMBOSS GUI"). The local alignment answers the question: "What is the highest region of similarity between two sequences?". We use that in a database search to find evidence for homology. We don't require the sequences to be similar over their whole length, in fact if they only share some related domains, forcing the non-related sequences to be part of the comparison would cause problems. 4 estab-ish 11 ||||| ||| 3 establish 11 The global alignment answers the question: what is the best alignment of two sequences. We use it when we assume (or would like to test if ...) the two sequences are related over their whole length. 1 disestab-ish--mentarian------ism 23 ..||||| ||| .|:...:|. ||| 1 reestablishedfederalagressivism 31 Importantly: the local alignment (Smith-Waterman) shows us only part of what's actually there, but that part is highlighted more clearly. So: database search -> local alignment, detailed analysis -> global alignment (plus taking into account suboptimal alignments as well). HTH, B. On 11 Mar 2006, at 09:35, Theodore H. Smith wrote: > > Hi people, > > I've successfully designed, written and compiled a program that > uses the smith-waterman algorithm. > > Nothing new there, but it's for an interesting project, and before > the project is complete, perhaps some questions asked to > bioinformaticians can help bring me up to your level. > > The next stage after compiling, is testing my algorithm. I now must > write some tests for my code. > > This is where I am seeing that I'm unsure if I even understand > Smith-Waterman properly! I understand Levenshtein OK (similar to > Needleman-Wunsch), but Smith-Waterman I'm a bit unclear on. > > Mostly I'm wondering exactly how does local matching help us, over > global matching. I got a lay person's description of why it helps, > but I'm more interested in getting an exact feel for it. > > Does it make sense to use English words as an example here, instead > of protein sequences? That would help me understand this a bit > better, as I have a better feel for English than proteins (unlike > many of you). > > Would then the main advantage be, for searching for short sequences > within long ones, without being unfairly penalised by the non- > matching ends of the long sequence? > > For example: "extrapolate" could match "extra", far better in Smith- > Waterman than it could using Levenshtein, because we aren't being > penalised so badly by the "polate" part. > > Or perhaps: "specialisation" would match "lisation" far better > using local than global, because we aren't being penalised by the > "specia" part so much. > > Or even: "disestablishmentarianism" would match "establishment" far > better using local than global, because we aren't being penalised > by "dis" or "arianism". > > Is that how local searches like Smith-Waterman benefit us? > > > What about when we are searching for two long sequences of which > only a small part will match? > > Let's say "disestabishmentarianism" against > "reestablishmentSomeNonMatchingPart". > > A local alignment should be able to figure out that "establishment" > aligns well in this case. > > Is that basically how Smith-Waterman helps us? > > -- > http://elfdata.com/plugin/ > > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From marty.gollery at gmail.com Sat Mar 11 11:11:22 2006 From: marty.gollery at gmail.com (Martin Gollery) Date: Sat, 11 Mar 2006 08:11:22 -0800 Subject: [BiO BB] Testing a smith-waterman algorithm? In-Reply-To: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> Message-ID: Yes, I believe you've got it. A local alignment between "disestabishmentarianism" and "reestablishmentSomeNonMatchingPart" would match establishment, but a fully global alignment would force the leading d to match the leading r and the trailing m to match the trailing t. Marty On 3/11/06, Theodore H. Smith wrote: > > > Hi people, > > I've successfully designed, written and compiled a program that uses > the smith-waterman algorithm. > > Nothing new there, but it's for an interesting project, and before > the project is complete, perhaps some questions asked to > bioinformaticians can help bring me up to your level. > > The next stage after compiling, is testing my algorithm. I now must > write some tests for my code. > > This is where I am seeing that I'm unsure if I even understand Smith- > Waterman properly! I understand Levenshtein OK (similar to Needleman- > Wunsch), but Smith-Waterman I'm a bit unclear on. > > Mostly I'm wondering exactly how does local matching help us, over > global matching. I got a lay person's description of why it helps, > but I'm more interested in getting an exact feel for it. > > Does it make sense to use English words as an example here, instead > of protein sequences? That would help me understand this a bit > better, as I have a better feel for English than proteins (unlike > many of you). > > Would then the main advantage be, for searching for short sequences > within long ones, without being unfairly penalised by the non- > matching ends of the long sequence? > > For example: "extrapolate" could match "extra", far better in Smith- > Waterman than it could using Levenshtein, because we aren't being > penalised so badly by the "polate" part. > > Or perhaps: "specialisation" would match "lisation" far better using > local than global, because we aren't being penalised by the "specia" > part so much. > > Or even: "disestablishmentarianism" would match "establishment" far > better using local than global, because we aren't being penalised by > "dis" or "arianism". > > Is that how local searches like Smith-Waterman benefit us? > > > What about when we are searching for two long sequences of which only > a small part will match? > > Let's say "disestabishmentarianism" against > "reestablishmentSomeNonMatchingPart". > > A local alignment should be able to figure out that "establishment" > aligns well in this case. > > Is that basically how Smith-Waterman helps us? > > -- > http://elfdata.com/plugin/ > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongyanww at yahoo.com Mon Mar 13 12:08:12 2006 From: hongyanww at yahoo.com (hongyan wang) Date: Mon, 13 Mar 2006 09:08:12 -0800 (PST) Subject: [BiO BB] calculate internal thermodynamic stability profiles In-Reply-To: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> Message-ID: <20060313170812.77403.qmail@web34010.mail.mud.yahoo.com> Hi, All, Does any body know any free internet-based software that can give an analysis of internal thermodynamic stability profiles for siRNA/shRNA . I need to assess the stability of the First Four nucleotides on either end of some siRNA -- perfect matched and mismatched. S-fold won't give the profile of the end nucleotides and mismatched siRNAs. Thanks very much in advance. Hongyan Wang --------------------------------- Brings words and photos together (easily) with PhotoMail - it's free and works with Yahoo! Mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From c.klaassen at cwz.nl Thu Mar 16 03:25:09 2006 From: c.klaassen at cwz.nl (=?ISO-8859-1?Q?Corn=E9_HW_Klaassen?=) Date: Thu, 16 Mar 2006 09:25:09 +0100 Subject: [BiO BB] base counting In-Reply-To: <71E4CF59-E980-46A2-97D6-4335BF2C0E3E@utoronto.ca> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> <71E4CF59-E980-46A2-97D6-4335BF2C0E3E@utoronto.ca> Message-ID: <441920E5.7070109@cwz.nl> Hi all, I remember having seem this once but I do not recollect exactly where so I'll just pop this question here: Does anyone know of a free software package (windows or on-line) that analyzes the frequency or counts all possible combinations of bases in a given sequence (single bases, dinucl. trinucl. tetranuc. etc.). Thanks in advance, Corn? From pmr at ebi.ac.uk Thu Mar 16 03:48:49 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 16 Mar 2006 08:48:49 +0000 Subject: [BiO BB] base counting In-Reply-To: <441920E5.7070109@cwz.nl> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> <71E4CF59-E980-46A2-97D6-4335BF2C0E3E@utoronto.ca> <441920E5.7070109@cwz.nl> Message-ID: <44192671.4030000@ebi.ac.uk> Corn? HW Klaassen wrote: > I remember having seem this once but I do not recollect exactly where so > I'll just pop this question here: > Does anyone know of a free software package (windows or on-line) that > analyzes the frequency or counts all possible combinations of bases in a > given sequence (single bases, dinucl. trinucl. tetranuc. etc.). compseq from EMBOSS will do this. For example, it will find in E.coli sequences the dramatic underrepresentation of CTAG (or CCTAG and CTAGG) due to mismatch repair mechanisms. To find such features on a range of scales, the chaos program in EMBOSS (Chaos Game Representation) can also be useful. The above feature shows as sets of white boxes. CpG features in mammalian genomes also appear in the plot. Shorter sequences take up larger areas of the plot. Once you know the scale of the feature you are looking for, a compseq run will report the under or over represented sequences. Hope that helps, Peter Rice From c.klaassen at cwz.nl Thu Mar 16 04:36:26 2006 From: c.klaassen at cwz.nl (=?ISO-8859-1?Q?Corn=E9_HW_Klaassen?=) Date: Thu, 16 Mar 2006 10:36:26 +0100 Subject: [BiO BB] base counting In-Reply-To: <44192671.4030000@ebi.ac.uk> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> <71E4CF59-E980-46A2-97D6-4335BF2C0E3E@utoronto.ca> <441920E5.7070109@cwz.nl> <44192671.4030000@ebi.ac.uk> Message-ID: <4419319A.60902@cwz.nl> Hi Peter, Thanks for the quick reply. On paper this is exactly what I'm looking for but ......I gave compseq a try and it doesn't seem to work on features larger than 20 nt whereas I'm particularly interested in features 40-140 nt (I realize that this can be a very computational intensive job). Any other suggestions? Is there perhaps something similar for protein sequences or on some other arbitrary units? Corn? >> I remember having seem this once but I do not recollect exactly where >> so I'll just pop this question here: >> Does anyone know of a free software package (windows or on-line) that >> analyzes the frequency or counts all possible combinations of bases >> in a given sequence (single bases, dinucl. trinucl. tetranuc. etc.). > > > compseq from EMBOSS will do this. For example, it will find in E.coli > sequences the dramatic underrepresentation of CTAG (or CCTAG and > CTAGG) due to mismatch repair mechanisms. > > To find such features on a range of scales, the chaos program in > EMBOSS (Chaos Game Representation) can also be useful. The above > feature shows as sets of white boxes. CpG features in mammalian > genomes also appear in the plot. Shorter sequences take up larger > areas of the plot. Once you know the scale of the feature you are > looking for, a compseq run will report the under or over represented > sequences. > > Hope that helps, > > Peter Rice > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From pmr at ebi.ac.uk Thu Mar 16 05:35:57 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 16 Mar 2006 10:35:57 +0000 Subject: [BiO BB] base counting In-Reply-To: <4419319A.60902@cwz.nl> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> <71E4CF59-E980-46A2-97D6-4335BF2C0E3E@utoronto.ca> <441920E5.7070109@cwz.nl> <44192671.4030000@ebi.ac.uk> <4419319A.60902@cwz.nl> Message-ID: <44193F8D.7060804@ebi.ac.uk> Corn? HW Klaassen wrote: > Hi Peter, > > Thanks for the quick reply. On paper this is exactly what I'm looking > for but ......I gave compseq a try and it doesn't seem to work on > features larger than 20 nt whereas I'm particularly interested in > features 40-140 nt (I realize that this can be a very computational > intensive job). Any other suggestions? Is there perhaps something > similar for protein sequences or on some other arbitrary units? Depends on what you are looking for. For very long features it would need a lot of data to identify a strange frequency. Also, compseq needs a table for every possible n-mer which is rather high by the time you reach 20 bases. You could try a shorter word size and look for overlaps. In the E.coli case, CTAG is low, and you can also compare TAGA TAGC TAGG TAGG to see which could be the less common 5mers. EMBOSS also has: wordcount, which reports the most frequent words of a given size. The memory used by wordcount depends on the size of the input (it works through all n-mers that actually appear, which would be close to 1 per base of input. polydot, which plots word matches between 1 or more sequences and can report their locations. Frequent nmers show up readily off the main diagonal. Looking at the wordcount output, it would be useful to set a minimum occurrence - it will report all words that appear once. For 40mers that means output is 40 times the original input length. I will do this for the next EMBOSS release! Hope that helps, Peter From akarger at CGR.Harvard.edu Thu Mar 16 09:56:00 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 16 Mar 2006 09:56:00 -0500 Subject: [BiO BB] Translate Ensembl Transcript ID to NCBI GB or gi IDs? Message-ID: I'm blasting a bunch of sequences against the Ensembl CDNA Human Genome, which provides Ensemble Transcript IDs (e.g., ENST00000326632). Is there an easy way to convert those to GenBank identifiers or GIs? BioMart offers to give all kinds of IDs, but NCBI IDs don't seem to be in the list. And I was unable to find translation files in some digging at the NCBI and Ensembl web sites. I'm happy to do it with a script, if there's a conversion file available. An online resource is OK, too, of course. Is the information found in the Ensembl MySQL stuff somewhere, such that I could get it with the Perl API? Thanks, - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From jeff at bioinformatics.org Thu Mar 16 10:08:09 2006 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Thu, 16 Mar 2006 10:08:09 -0500 Subject: [BiO BB] base counting In-Reply-To: <441920E5.7070109@cwz.nl> References: <95DEE259-12FB-4BA6-A261-450EFF393CA5@elfdata.com> <71E4CF59-E980-46A2-97D6-4335BF2C0E3E@utoronto.ca> <441920E5.7070109@cwz.nl> Message-ID: <44197F59.6080307@bioinformatics.org> Hi Corn?, Poly will find mono-, di-, tri-, etc. nucleotide repetitive sequences without a table or dictionary, provided they are exact repeats: http://bioinformatics.org/poly/ And it will give you the frequencies, representation, and other metrics such as "proportion". MREPATT will do pretty much the same: http://alggen.lsi.upc.es/recerca/search/mrepatt/ Cheers, Jeff Corn? HW Klaassen wrote: > Hi all, > > I remember having seem this once but I do not recollect exactly where so > I'll just pop this question here: > Does anyone know of a free software package (windows or on-line) that > analyzes the frequency or counts all possible combinations of bases in a > given sequence (single bases, dinucl. trinucl. tetranuc. etc.). > > Thanks in advance, > > Corn? -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- From christoph.gille at charite.de Fri Mar 17 07:29:30 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 17 Mar 2006 13:29:30 +0100 (CET) Subject: [BiO BB] protein docking Message-ID: <58976.141.42.56.114.1142598570.squirrel@webmail.charite.de> I have two protein structures and want to answer the question whether they can directly interact or not. I have run the docking program zdock. I do not know how to interpret the output to tell whether interaction is energetically possible or not. Can somebody please help me ? From john_abraham_bio at yahoo.com Fri Mar 17 08:52:30 2006 From: john_abraham_bio at yahoo.com (John Abraham) Date: Fri, 17 Mar 2006 05:52:30 -0800 (PST) Subject: [BiO BB] protein docking In-Reply-To: <58976.141.42.56.114.1142598570.squirrel@webmail.charite.de> Message-ID: <20060317135230.60042.qmail@web53714.mail.yahoo.com> Dear Dr.Gille Iworked with ZDOCK.For the protein protein docking you need to have prior knowledge of your proteins should interact.Once you have that ,take the most favorable orientation and can calculate the interaction energies( like between binding region of one protein with residues of other ).Accerlys have a script that calculates the interaction energies.Atleast from these you can whther interactions are favorable or not Hope this helps John "Dr. Christoph Gille" wrote: I have two protein structures and want to answer the question whether they can directly interact or not. I have run the docking program zdock. I do not know how to interpret the output to tell whether interaction is energetically possible or not. Can somebody please help me ? _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Yahoo! Mail Use Photomail to share photos without annoying attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From er.sukhdeepsingh at gmail.com Fri Mar 17 06:24:51 2006 From: er.sukhdeepsingh at gmail.com (Sukhdeep Singh) Date: Fri, 17 Mar 2006 16:54:51 +0530 Subject: [BiO BB] Re: BiO_Bulletin_Board Digest, Vol 17, Issue 12 In-Reply-To: <20060316170105.BAAD924073@primary.bioinformatics.org> References: <20060316170105.BAAD924073@primary.bioinformatics.org> Message-ID: <40fbb41e0603170324j587a0c95ja579de62591032a6@mail.gmail.com> hello guys myself SUKHDEEP SINGH a 2ND YEAR student of AMBALA COLLEGE OF ENGINEERING & APPLIED RESEARCH. pals i am very much dedicated to bioinformatics and want to do something great in it. i have also done basic & advanced courses in BIOINFORMATICS in my 15 day winter vacation. I hav learned the functions of some softwares such as RASMOL,SWISSPDB,CN3D( V3.1),CLUSTAL-X,HYPERCAM(V7.5 student evaluation version). i am very much dedicated to it because i have a good knowledge of computers as i am operating it for about 4 years but moderate knowledge of bio. I am also familier to the databases like KEGG,NCBI,PUBMED,ENTREZ etc. so i want you to help me by telling me any tutorial program for BIOJAVA,BIOPERL or any institute giving training in bioinformatics or any other subject related to BIOINFORMATICS for 45 days nearly in the month of july-august. so please friends jus help me out with this REPLY me at er.sukhdeepsingh at gmail.com SUKHDEEP SINGH -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at rrowv.name Sat Mar 18 21:08:20 2006 From: me at rrowv.name (Daniel Terry) Date: Sat, 18 Mar 2006 21:08:20 -0500 Subject: [BiO BB] Database of known RNA secondary structures Message-ID: <441CBD14.1010307@rrowv.name> Hi, I am working on a RNA structure prediction project and I need an RNA secondary structure database for benchmarking. My problem is that the ones I have found do not indicate if structures are known or predicted (through DPA or comparative modeling). Does anyone know of any RNA secondary structure databases (mainly rRNA) that either have only *known* structures or indicates which are predicted? Example: http://www.rna.icmb.utexas.edu/ -- Daniel Terry Senior Undergraduate in Computer Science Purdue University at Indianapolis (IUPUI) http://www.cs.iupui.edu/~dsterry From iscis06_noreply at sabanciuniv.edu Tue Mar 21 04:53:51 2006 From: iscis06_noreply at sabanciuniv.edu (ISCIS06) Date: Tue, 21 Mar 2006 11:53:51 +0200 Subject: [BiO BB] ISCIS 2006 CFP Message-ID: <441FCD2F.5000401@sabanciuniv.edu> Apologies for cross postings... ---------------------------------------------------------------------- Please note that the proceedings of the symposium will be published by Springer-Verlag in Lecture Notes in Computer Science series. The paper submission system is open at http://fens.sabanciuniv.edu/iscis06/?paper_submission/paper_submission.html Due to requests from authors, paper submission deadline is extended to April 21, 2006. Please feel free to forward this CFP to interested parties. ---------------------------------------------------------------------- ISCIS'06 Call for Papers The 21st International Symposium on Computer and Information Sciences November 1-2-3, 2006 Istanbul, Turkey organized by Sabanci University We kindly invite you to submit papers for the twenty-first of the ISCIS series of conferences that bring together computer scientists and engineers from around the world. This year's conference will be held in Istanbul and supported by The Scientific and Technological Research Council of Turkey (TUBITAK, tentatively). Topics of interest include, but are not limited to: Algorithms Bioinformatics and Scientific Computing Computational Intelligence Computer Architecture and Embedded Systems Computer Graphics & Virtual Reality Computer Networks Computer Vision Data Mining Databases Information Retrieval Mobile Computing Parallel and Distributed Computing Performance Evaluation Reconfigurable Computing Systems Security & Cryptography Software Engineering Theoretical Computer Science This year, we especially welcome papers in the areas of Virtual Reality and Pervasive Computing related fields (Mobile Computing and Networks, Security/Privacy/Trust, Embedded Systems, Computer Graphics). There will be invited talks given by leading researchers in their fields. PAPER SUBMISSION AND PUBLICATION Authors are invited to submit manuscripts written in English. Submitted papers should address original work not published or under revision elsewhere. All submissions will be refereed by experts in the field based on originality, significance, quality, and clarity. The proceedings of the symposium will be published by Springer-Verlag in the prestigious Lecture Notes in Computer Science series. Since 2003 ISCIS proceedings are published in LNCS by Springer Verlag. Papers should not exceed 10 pages (including all references, tables, and figures). Papers should comply with LNCS style and be submitted in PDF (.pdf) format. Please refer to Instructions for Authors (http://www.springer.com/comp/lncs/authors.html) for formatting the manuscript. The conference organizers reserve the rights to reject submissions that exceed the specified page limit or that do not follow the LNCS proceedings format. Manuscripts should be submitted using the ISCIS'06 on-line submission system available at http://fens.sabanciuniv.edu/iscis06/?paper_submission/paper_submission.html. IMPORTANT DATES Submission of full papers: April 21, 2006 Notification of acceptance: June 30, 2006 Camera-Ready copies: July 21, 2006 HONORARY CHAIR Erol Gelenbe -- Imperial College, UK ORGANIZING COMMITTEE Albert Levi -- Sabanci University, Turkey Erkay Savas -- Sabanci University, Turkey Husnu Yenigun -- Sabanci University, Turkey Selim Balcisoy -- Sabanci University, Turkey Yucel Saygin -- Sabanci University, Turkey PROGRAM COMMITTEE Abdullah Uz Tansel -- City University of New York, USA Adnan Yazici -- Middle East Technical University, Turkey Alain Jean-Marie -- LIRMM/INRIA/CNRS, France Alex Orailoglu -- University of California, San Diego, USA Andrea D'Ambrogio -- Universita di Roma "Tor Vergata", Italy Athena Vakali -- University of Thesaoniki, Greece Attila Gursoy -- Koc University, Turkey Bakhadyr Khoussainov -- University of Auckland, New Zealand Berk Sunar -- Worcester Polytechnic Institute, USA Berrin Yanikoglu -- Sabanci University, Turkey Bulent Orencik -- Istanbul Technical University, Turkey Cagatay Tekmen -- Qualcomm, USA Carlos Juiz -- Universitat de les Illes Balears, Spain Cem Say -- Bogazici University, Turkey Cevdet Aykanat -- Bilkent University, Turkey Daniel Thalmann -- EPFL, Switzerland Danny Soroker -- IBM T.J. Watson Research Center, USA Doron Peled -- University of Warwick, UK Ethem Alpaydin -- Bogazici University, Turkey Eylem Ekici -- Ohio State University, USA Fatih Alagoz -- Bogazici University, Turkey Fevzi Belli -- Universitat Paderborn, Germany Francisco Rodriguez -- CINVESTAV, Mexico Gabriel Ciobanu -- Romanian Academy, Romania Giuseppe Iazeolla -- Universita di Roma "Tor Vergata", Italy Gultekin Ozsoyoglu -- Case Western Reserve University, USA Guy Vincent Jourdan -- University of Ottawa, Canada I. Budak Arpinar -- University of Georgia, USA Iain Duff -- CCLRC Rutherford Appleton Laboratory, UK Ibrahim Korpeoglu -- Bilkent University, Turkey Igor S. Pandzic -- University of Zagreb, Croatia Javier Barria -- Imperial College, UK Javier Garcia Villalba -- Complutense University of Madrid, Spain Jean-Michel Fourneau -- Universite de Versailles, France Jeremy Pitt -- Imperial College, UK Johann Groszschaedl -- IAIK TU Graz, Austria Kainam Tom Wong -- University of Waterloo, Canada Kanchana Kanchanasut -- Asian Institute of Technology, Thailand Kemal Oflazer -- Sabanci University, Turkey Khaldoun El Agha -- LRI, University of Paris XI, France Lale Akarun -- Bogazici University, Turkey Mariacarla Calzarossa -- Universita di Pavia, Italy Mehmet Orgun -- Macquarie University, Australia Mohamed Elfeky -- Google, USA Mohand Said Hacid -- Univ of Lyon, France Mustafa Unel -- Sabanci University, Turkey Nahid Shahmehri -- Linkpings Universitet, Sweden Nassir Navab -- TU Munich, Germany Onn Shehory -- IBM Research Labs, Israel Ozgur B. Akan -- Middle East Technical University, Turkey Ozgur Ercetin -- Sabanci University, Turkey Peter Harrison -- Imperial College, UK Philippe Jacquet -- INRIA, France Philippe Nain -- INRIA, France Pierre Flener -- Uppsala University, Sweden Pinar Yolum -- Bogazici University, Turkey Robert P. Kurshan -- Cadence, USA Robert Wrembel -- Poznan University of Technology, Poland Sahin Albayrak -- Technical University of Berlin, Germany Sibel Adali -- Rensselaer Polytechnic Institute, USA Tamer Ozsu -- University of Waterloo, Canada Thomas Strang -- DLR, Germany and University of Innsbruck, Austria Tolga Capin -- Nokia, USA Tonguc Unluyurt -- Sabanci University, Turkey Tuna Tugcu -- Bogazici University, Turkey Ufuk Caglayan -- Bogazici University, Turkey Ugur Cetintemel -- Brown University, USA Ugur Gudukbay -- Bilkent University, Turkey Ugur Sezerman -- Universita di Bologna, Italy Umit Uyar -- CUNY, USA Vedat Coskun -- Isik University, Turkey Yankin Tanurhan -- Actel Corporation, USA Yusuf Pisan -- University of Technology Sydney, Australia LOCAL ORGANIZING COMMITTE C. Emre Sayin -- Sabanci University, Turkey Alisher Kholmatov -- Sabanci University, Turkey Ilknur Durgar El-Kahlout -- Sabanci University, Turkey Selim Volkan Kaya -- Sabanci University, Turkey Ozlem Cetinoglu -- Sabanci University, Turkey Basak Alper -- Sabanci University, Turkey CONTACT INFO ISCIS'06 Sabanci University Faculty of Engineering and Natural Sciences Orhanli Tuzla, 34956 Istanbul TURKEY Web: http://fens.sabanciuniv.edu/iscis06/ E-mail: iscis06 at sabanciuniv.edu Fax: +90 216 4839550 From karishma_81 at indiatimes.com Wed Mar 22 02:03:36 2006 From: karishma_81 at indiatimes.com (karishma_81) Date: Wed, 22 Mar 2006 12:33:36 +0530 Subject: [BiO BB] regarding training Message-ID: <200603220630.MAA09900@WS0005.indiatimes.com> hiiiii to all, hi i am a new member of this board. i m a student of post graduate diploma in bioinformatics, and we have to do a project in our second semester. i just want u people to help me if u have any knowledge of any company or institute that is undergoing training in bioinformatics. i would be really helpful 4 me. thanks. From marty.gollery at gmail.com Wed Mar 22 09:20:12 2006 From: marty.gollery at gmail.com (Martin Gollery) Date: Wed, 22 Mar 2006 06:20:12 -0800 Subject: [BiO BB] regarding training In-Reply-To: <200603220630.MAA09900@WS0005.indiatimes.com> References: <200603220630.MAA09900@WS0005.indiatimes.com> Message-ID: Hi Karishma, One place that you could go to look for projects is the emboss website. They have a list of projects that need to be done at http://emboss.sourceforge.net/apps/proposed.html. Best regards, Marty On 3/21/06, karishma_81 wrote: > > > hiiiii to all, > > hi i am a new member of this board. i m a student of post > graduate diploma in bioinformatics, and we have to do a project in our > second semester. i just want u people to help me if u have any knowledge of > any company or institute that is undergoing training in bioinformatics. i > would be really helpful 4 me. > > thanks. > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmckay at tusd.net Wed Mar 22 10:52:03 2006 From: mmckay at tusd.net (mmckay at tusd.net) Date: Wed, 22 Mar 2006 15:52:03 +0000 GMT Subject: [BiO BB] regarding training In-Reply-To: <200603220630.MAA09900@WS0005.indiatimes.com> References: <200603220630.MAA09900@WS0005.indiatimes.com> Message-ID: <1561440585-1143042724-cardhu_blackberry.rim.net-444256443-@bwe059-cell00.bisx.prod.on.blackberry> Where are you doing your post graduate work? I am looking for a post grad certificate program I can do online Sent from my BlackBerry wireless handheld. -----Original Message----- From: "karishma_81" Date: Wed, 22 Mar 2006 12:33:36 To: Subject: [BiO BB] regarding training hiiiii to all, hi i am a new member of this board. i m a student of post graduate diploma in bioinformatics, and we have to do a project in our second semester. i just want u people to help me if u have any knowledge of any company or institute that is undergoing training in bioinformatics. i would be really helpful 4 me. thanks. _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From jeff at bioinformatics.org Wed Mar 22 16:38:31 2006 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Wed, 22 Mar 2006 16:38:31 -0500 Subject: [BiO BB] regarding training In-Reply-To: <200603220630.MAA09900@WS0005.indiatimes.com> References: <200603220630.MAA09900@WS0005.indiatimes.com> Message-ID: <4421C3D7.1040906@bioinformatics.org> Hi Karishma, Volunteering to help an open-source project can be a good way to gain some experience. We have a couple hundred projects hosted here, most of which would appreciate help: http://bioinformatics.org/search/fullprojectlist.php I suggest identifying a project that (1) is active, (2) is addressing a biological problem that you're interested in, (3) is using a programming/scripting language or other tool that you're familiar with, and (4) will give you the opportunity to produce something meaningful in a short period of time (most likely a small, discrete project). You can then use the Web form to contact the project admin. SourceForge.net also has a list of FOSS bioinformatics projects. Cheers, Jeff karishma_81 wrote: > hiiiii to all, > > hi i am a new member of this board. i m a student of post graduate diploma > in bioinformatics, and we have to do a project in our second semester. i > just want u people to help me if u have any knowledge of any company or > institute that is undergoing training in bioinformatics. i would be really > helpful 4 me. > > thanks. -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- From yesint4 at yahoo.com Thu Mar 23 04:30:58 2006 From: yesint4 at yahoo.com (Semen Esilevsky) Date: Thu, 23 Mar 2006 01:30:58 -0800 (PST) Subject: [BiO BB] How to find the same proteins? Message-ID: <20060323093058.34076.qmail@web36503.mail.mud.yahoo.com> Dear all, I'm a novice in bioinformatics and this question is probably stupid, but... I have a list of ~200 PDB id's. For each of them I have to build a list of all entries in PDB, which represent the same protein (say, >99% sequence similarity and no large gaps). Could someone suggest me the least painfull way of doing this? As far as I understand all what I need is the database where all pairwice BLAST allignments of PDB chains are stored. I've found one as a part of a PISCES server, but it is incomplete and contains some internal inconsistensies. Could someone suggest me a better one or there is a simpler way out? Best, Semen __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From pankaj at nii.res.in Thu Mar 23 05:15:18 2006 From: pankaj at nii.res.in (Pankaj) Date: Thu, 23 Mar 2006 15:45:18 +0530 Subject: [BiO BB] How to find the same proteins? In-Reply-To: <20060323093058.34076.qmail@web36503.mail.mud.yahoo.com> References: <20060323093058.34076.qmail@web36503.mail.mud.yahoo.com> Message-ID: <20060323101518.M57410@nii.res.in> For this u can go to NCBI BLAST page and go to BLASTP. There u can paste ur sequence and select PDB as the database to query. Just click on BLAST and u get all seq similar to ur sequence. Filter out the results to find PDB ids >99% similar to ur protein. Sine u have 200 proteins u can download NCBI database and run local BLAST also. Cheers Pankaj Khurana Research Scholar National Institute of Immunology New Delhi India -- Open WebMail Project (http://openwebmail.org) ---------- Original Message ----------- From: Semen Esilevsky To: bio_bulletin_board at bioinformatics.org Sent: Thu, 23 Mar 2006 01:30:58 -0800 (PST) Subject: [BiO BB] How to find the same proteins? > Dear all, > I'm a novice in bioinformatics and this question is > probably stupid, but... > I have a list of ~200 PDB id's. For each of them I > have to build a list of all entries in PDB, which > represent the same protein (say, >99% sequence > similarity and no large gaps). Could someone suggest > me the least painfull way of doing this? > As far as I understand all what I need is the database > where all pairwice BLAST allignments of PDB chains are > stored. I've found one as a part of a PISCES server, > but it is incomplete and contains some internal > inconsistensies. Could someone suggest me a better one > or there is a simpler way out? > > Best, > Semen > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board ------- End of Original Message ------- From mmarchywka at eyewonder.com Thu Mar 23 09:00:47 2006 From: mmarchywka at eyewonder.com (Mike Marchywka) Date: Thu, 23 Mar 2006 09:00:47 -0500 Subject: [BiO BB] How to find the same proteins? Message-ID: <73CA026E5E77C74398C69F3338C5967C0750E0B0@atlexc01.atlanta.eyewonder.com> Also, they do offer specific utilities for blast: http://www.ncbi.nlm.nih.gov/blast/docs/netblast.html ************************************************************************* Mike Marchywka EyeWonder Instant Streaming, Infinite Results 1447 Peachtree Street 9th Floor Atlanta, GA 30309 w.678-891-2033 c. h.770-565-8101 mmarchywka at eyewonder.com alt: marchywka at hotmail.com Instant Streaming, Intelligent results. ************************************************************************* -----Original Message----- From: bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformatics.org [mailto:bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformati cs.org]On Behalf Of Pankaj Sent: ThursdayMarch-23-2006 05:15 AM To: The general forum at Bioinformatics.Org Subject: Re: [BiO BB] How to find the same proteins? For this u can go to NCBI BLAST page and go to BLASTP. There u can paste ur sequence and select PDB as the database to query. Just click on BLAST and u get all seq similar to ur sequence. Filter out the results to find PDB ids >99% similar to ur protein. Sine u have 200 proteins u can download NCBI database and run local BLAST also. Cheers Pankaj Khurana Research Scholar National Institute of Immunology New Delhi India -- Open WebMail Project (http://openwebmail.org) ---------- Original Message ----------- From: Semen Esilevsky To: bio_bulletin_board at bioinformatics.org Sent: Thu, 23 Mar 2006 01:30:58 -0800 (PST) Subject: [BiO BB] How to find the same proteins? > Dear all, > I'm a novice in bioinformatics and this question is > probably stupid, but... > I have a list of ~200 PDB id's. For each of them I > have to build a list of all entries in PDB, which > represent the same protein (say, >99% sequence > similarity and no large gaps). Could someone suggest > me the least painfull way of doing this? > As far as I understand all what I need is the database > where all pairwice BLAST allignments of PDB chains are > stored. I've found one as a part of a PISCES server, > but it is incomplete and contains some internal > inconsistensies. Could someone suggest me a better one > or there is a simpler way out? > > Best, > Semen > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board ------- End of Original Message ------- _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From dmb at mrc-dunn.cam.ac.uk Thu Mar 23 09:18:25 2006 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Thu, 23 Mar 2006 14:18:25 +0000 Subject: [BiO BB] How to find the same proteins? In-Reply-To: <20060323093058.34076.qmail@web36503.mail.mud.yahoo.com> References: <20060323093058.34076.qmail@web36503.mail.mud.yahoo.com> Message-ID: <4422AE31.7090308@mrc-dunn.cam.ac.uk> Semen Esilevsky wrote: > Dear all, > I'm a novice in bioinformatics and this question is > probably stupid, but... > I have a list of ~200 PDB id's. For each of them I > have to build a list of all entries in PDB, which > represent the same protein (say, >99% sequence > similarity and no large gaps). Could someone suggest > me the least painfull way of doing this? > As far as I understand all what I need is the database > where all pairwice BLAST allignments of PDB chains are > stored. I've found one as a part of a PISCES server, > but it is incomplete and contains some internal > inconsistensies. Could someone suggest me a better one > or there is a simpler way out? It is not a stupid question, but rather a common problem for the whole field! It would be useful if you could describe the problems you are having with PISCES, as that is a very popular and commonly used database. The simplest approach I can think of is to combine your list of proteins with a full fasta database of the PDB (unless your proteins are already in that fasta file), and then run CD-HIT on the fasta file (with your own choice of sequence identity clustering threshold)... http://bioinformatics.org/cd-hit/ 'The same' proteins (defined here by sequence identity) will be found in the same CD-HIT clusters. Hmm... That reminds me... > Best, > Semen > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From dmb at mrc-dunn.cam.ac.uk Thu Mar 23 09:31:09 2006 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Thu, 23 Mar 2006 14:31:09 +0000 Subject: [BiO BB] CD-HIT beta testers? Message-ID: <4422B12D.6080509@mrc-dunn.cam.ac.uk> Hi, I am writing to this list to ask for help with 'beta testing' the latest version of CD-HIT. Weizhong Li recently released a new version, which you can find here... http://bioinformatics.org/project/filelist.php?group_id=350 (its 'cd-hit-2006-0215.tar.gz') If anyone on this list uses (or has used) CD-HIT, they may like to help out on the 'CD-HIT team' (currently me running the website and Weizhong doing everything else!) by testing this release. It would be very helpful if anyone can spare the time to check over this release, and write some general notes on the experience (what went wrong, what was not clear etc. etc.) and post those comments to the group mailing list, http://bioinformatics.org/mail/?group_id=350 At the moment any feedback on usage is welcome, as unfortunately I don't have the time to check this version in any detail. More generally, any assistance on the project is most welcome! Once we finish testing we can announce the new release on the 'News' section of Bioinformatics.Org, and it would be great to have a big list of people to thank for contributing to such great software. Anyone who wants to be added to the list of developers on the project please just email me your Bioinformatics.Org username! Thanks very much for your time (and support), Dan. P.S. I put a 'test set' of data here; http://ftp.bioinformatics.org/pub/cd-hit/TEST_RESULTS/ and you can find a list of the current (and possibly fixed!) bugs here; http://bioinformatics.org/bugs/?group_id=350 From mmarchywka at eyewonder.com Thu Mar 23 09:47:58 2006 From: mmarchywka at eyewonder.com (Mike Marchywka) Date: Thu, 23 Mar 2006 09:47:58 -0500 Subject: [BiO BB] How to find the same proteins? Message-ID: <73CA026E5E77C74398C69F3338C5967C07553DD7@atlexc01.atlanta.eyewonder.com> If my earlier reply ever gets by the moderator you will see that generally nlm supports automated searches via eutils but they appear to support blast only via a special utility. The clustering added from your site is a nice additional feature but it is amazingly easy to download clustering software from many sources and run with scripts for any purpose- I used gene expression array software to organize authors from a biotech message board. ************************************************************************* Mike Marchywka EyeWonder Instant Streaming, Infinite Results 1447 Peachtree Street 9th Floor Atlanta, GA 30309 w.678-891-2033 c. h.770-565-8101 mmarchywka at eyewonder.com alt: marchywka at hotmail.com Instant Streaming, Intelligent results. ************************************************************************* -----Original Message----- From: bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformatics.org [mailto:bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformati cs.org]On Behalf Of Dan Bolser Sent: ThursdayMarch-23-2006 09:18 AM To: The general forum at Bioinformatics.Org Subject: Re: [BiO BB] How to find the same proteins? Semen Esilevsky wrote: > Dear all, > I'm a novice in bioinformatics and this question is > probably stupid, but... > I have a list of ~200 PDB id's. For each of them I > have to build a list of all entries in PDB, which > represent the same protein (say, >99% sequence > similarity and no large gaps). Could someone suggest > me the least painfull way of doing this? > As far as I understand all what I need is the database > where all pairwice BLAST allignments of PDB chains are > stored. I've found one as a part of a PISCES server, > but it is incomplete and contains some internal > inconsistensies. Could someone suggest me a better one > or there is a simpler way out? It is not a stupid question, but rather a common problem for the whole field! It would be useful if you could describe the problems you are having with PISCES, as that is a very popular and commonly used database. The simplest approach I can think of is to combine your list of proteins with a full fasta database of the PDB (unless your proteins are already in that fasta file), and then run CD-HIT on the fasta file (with your own choice of sequence identity clustering threshold)... http://bioinformatics.org/cd-hit/ 'The same' proteins (defined here by sequence identity) will be found in the same CD-HIT clusters. Hmm... That reminds me... > Best, > Semen > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From richard.squires at utsouthwestern.edu Thu Mar 23 10:11:10 2006 From: richard.squires at utsouthwestern.edu (Burke Squires) Date: Thu, 23 Mar 2006 09:11:10 -0600 Subject: [BiO BB] Fwd: Clade designation/clustering References: <2064E912-E8BD-4920-9A20-D038E9FAA8E3@gmail.com> Message-ID: <8C7F57B3-5835-4FD0-93B4-89FE20AAA426@utsouthwestern.edu> Hello all, I am characterizing strains of an organism and trying to distinguish clades of individual genes. I have performed multiple sequence alignments, created phylogenetic trees and done distance matrix calculations. I can "see" clades in the tree but I want to statistically prove that they are clades. How do I do this? I have tried some hierarchical clusterings but maybe I do not have the right program. Got any ideas? Thanks, Burke From maximilianh at gmail.com Thu Mar 23 10:44:33 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Thu, 23 Mar 2006 16:44:33 +0100 Subject: [BiO BB] Re: [EMBOSS] d_ino In-Reply-To: References: Message-ID: <76f031ae0603230744m26eae154k@mail.gmail.com> If you really want to teach more than just emboss, notably bioperl, cygwin isn't a good choice, in my opinion. You will invest hours just to find fixes for all your favorite software to cygwin. If speed is an issue here: I'd rather choose www.colinux.org, prepare one disk image file for all computers, copy this to the windows-computers and start colinux on every computer. That will give you an ordinary linux running on windows computers at the same speed as a normal linux (except for disk access, of course). If speed isn't an issue: An even easier alternative would be to tell the students to telnet onto any linux computer and do their work on there... if telnet isn't convenient or graphical enough, they could install the package cygwin-x and do an ssh onto the linux machine. Max On 09/03/06, Jeffrey Blanchard wrote: > Hello, > > I am trying to install EMBOSS under cygwin for teaching purposes. > > make crashes on ajfile because d_ino appears to be missing in current > version of cygwin. > > Is there a work around for this? > > Thanks, Jeff > > ------------------------------- > Jeffrey L. Blanchard > Assistant Professor > Department of Microbiology > University of Massachusetts > Amherst, MA 01003 > Office and Lab: Morrill I N330 > Tel: 413-577-2130 > Fax: 413-545-1578 > http://www.bio.umass.edu/micro/blanchard/Lab_About.html > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From burkesquires at gmail.com Wed Mar 22 23:23:59 2006 From: burkesquires at gmail.com (Burke Squires) Date: Wed, 22 Mar 2006 22:23:59 -0600 Subject: [BiO BB] Clade designation/clustering Message-ID: <2064E912-E8BD-4920-9A20-D038E9FAA8E3@gmail.com> Hello all, I am characterizing strains of an organism and trying to distinguish clades of individual genes. I have performed multiple sequence alignments, created phylogenetic trees and done distance matrix calculations. I can "see" clades in the tree but I want to statistically prove that they are clades. How do I do this? I have tried some hierarchical clusterings but maybe I do not have the right program. Got any ideas? Thanks, Burke From mmarchywka at eyewonder.com Thu Mar 23 08:36:38 2006 From: mmarchywka at eyewonder.com (Mike Marchywka) Date: Thu, 23 Mar 2006 08:36:38 -0500 Subject: [BiO BB] How to find the same proteins? Message-ID: <73CA026E5E77C74398C69F3338C5967C0750E0AE@atlexc01.atlanta.eyewonder.com> ( sorry if this is a little off target- this is my first post to the list and I'm cleaning out mailbox after returning from vacation) Funny you should ask, the pubmed webmaster blasted me ( pun intended) for suggesting they don't support automated searches. See their eutils support page and you can write your own scripts. "Dear NCBI user, our requirements are described here - basically we ask for 3 sec. delay between subsequent calls. http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html (see User requirements). see also: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=coursework.TOC&depth=2 Regards, NCBI Help desk " The hard part is not going to be fixed by running blast locally- it is computationally rather than IO intensive. I had this problem looking for chance epitope matches between a whole protein vaccine ( a specific phosphatase ) and other things and now I'm trying to look up some patented peptide sequences for accidental matches. I have attempted to clean up a script for illustration that uses their normal user interface- it is rather cumbersome and involved and does not use the eutils facility. It does show that you can take a specific sequence from a pubmed entry, reformat it for other web services ( like epitope prediction), and send those results to blast to look for hits. Then, you can use eutils and scripts to filter as needed at the time. I test these using my last name as a sequence ( to see what I'm related to :)) and did verify they still work: $ blast -expect 1000 -format Text MARCHYWKA $ blast_cleanedup_a_litte -expect 1000 -format Text MARCHYWKA Again, this uses their webform as a kluge, I have enother script for using their eutils facility but have neveer run it on blast, just to do bulk abstract downloads ( which, by the way, can be organized with the gene expression array software- it doesn't know genes from keywords or conditions from documents...). Let me know if you do an automated search using eutils. ( I finally decided not to post whole, messy kluge, just this part if you want to use it right away. I would suggest seeing what eutils has but this should work if you have cygwin or linux). QUERYSTR="QUERY=${SEQ}&QUERY_FROM=&QUERY_TO=&DATABASE=nr&ENTREZ_QUERY=&ENTREZ_QUERY=All+organisms&COMPOSITION_BASED_STATISTICS=0&EXPECT=20000&WORD_SIZE=2&MATRIX_NAME=PAM30&GAPCOSTS=9+1&PSSM=&OTHER_ADVANCED=&PHI_PATTERN=&SHOW_OVERVIEW=on&SHOW_LINKOUT=on&GET_SEQUENCE=on&NCBI_GI=on&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&MASK_CHAR=0&MASK_COLOR=0&DESCRIPTIONS=100&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&I_THRESH=0.005&FORMAT_ENTREZ_QUERY=&FORMAT_ENTREZ_QUERY=All+organisms&EXPECT_LOW=&EXPECT_HIGH=&LAYOUT=TwoWindows&FORMAT_BLOCK_ON_RESPAGE=None&AUTO_FORMAT=Semiauto&PROGRAM=blastp&CLIENT=web&SERVICE=plain&PAGE=Proteins&CMD=Put" QUERYSTR="QUERY=${SEQ}&QUERY_FROM=&QUERY_TO=&DATABASE=nr&ENTREZ_QUERY=&ENTREZ_QUERY=Homo+sapiens+[ORGN]&COMPOSITION_BASED_STATISTICS=0&EXPECT=2000&WORD_SIZE=2&MATRIX_NAME=PAM30&GAPCOSTS=9+1&PSSM=&OTHER_ADVANCED=&PHI_PATTERN=&SHOW_OVERVIEW=on&SHOW_LINKOUT=on&GET_SEQUENCE=on&NCBI_GI=on&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&MASK_CHAR=0&MASK_COLOR=0&DESCRIPTIONS=1000&ALIGNMENTS=1000&ALIGNMENT_VIEW=Pairwise&I_THRESH=0.005&FORMAT_ENTREZ_QUERY=&FORMAT_ENTREZ_QUERY=Homo+sapiens+[ORGN]&EXPECT_LOW=&EXPECT_HIGH=&LAYOUT=TwoWindows&FORMAT_BLOCK_ON_RESPAGE=None&AUTO_FORMAT=Semiauto&PROGRAM=blastp&CLIENT=web&SERVICE=plain&PAGE=Proteins&CMD=Put" QUERYSTR="QUERY=${SEQ}&QUERY_FROM=&QUERY_TO=&DATABASE=nr&ENTREZ_QUERY=&ENTREZ_QUERY=All+organisms&COMPOSITION_BASED_STATISTICS=0&EXPECT=20&WORD_SIZE=2&MATRIX_NAME=PAM30&GAPCOSTS=9+1&PSSM=&OTHER_ADVANCED=&PHI_PATTERN=&SHOW_OVERVIEW=on&SHOW_LINKOUT=on&GET_SEQUENCE=on&NCBI_GI=on&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&MASK_CHAR=0&MASK_COLOR=0&DESCRIPTIONS=1000&ALIGNMENTS=1000&ALIGNMENT_VIEW=Pairwise&I_THRESH=0.005&FORMAT_ENTREZ_QUERY=&FORMAT_ENTREZ_QUERY=All+organisms&EXPECT_LOW=&EXPECT_HIGH=&LAYOUT=TwoWindows&FORMAT_BLOCK_ON_RESPAGE=None&AUTO_FORMAT=Semiauto&PROGRAM=blastp&CLIENT=web&SERVICE=plain&PAGE=Proteins&CMD=Put" QS1="QUERY=${SEQ}" QS2="&QUERY_FROM=&QUERY_TO=&DATABASE=nr&ENTREZ_QUERY=&ENTREZ_QUERY=All+organisms&COMPOSITION_BASED_STATISTICS=0" QS3="&EXPECT=${expect}&WORD_SIZE=2&MATRIX_NAME=PAM30&GAPCOSTS=9+1&PSSM=&OTHER_ADVANCED=&PHI_PATTERN=&SHOW_OVERVIEW=on" QS4="&SHOW_LINKOUT=on&GET_SEQUENCE=on&NCBI_GI=on&FORMAT_OBJECT=Alignment&FORMAT_TYPE=${format}&MASK_CHAR=0&MASK_COLOR=0" QS5="&DESCRIPTIONS=1000&ALIGNMENTS=1000&ALIGNMENT_VIEW=Pairwise&I_THRESH=0.005&FORMAT_ENTREZ_QUERY=" QS6="&FORMAT_ENTREZ_QUERY=All+organisms&EXPECT_LOW=&EXPECT_HIGH=&LAYOUT=TwoWindows&FORMAT_BLOCK_ON_RESPAGE=None" QS7="&AUTO_FORMAT=Semiauto&PROGRAM=blastp&CLIENT=web&SERVICE=plain&PAGE=Proteins&CMD=Put" QUERYSTR="${QS1}${QS2}${QS3}${QS4}${QS5}${QS6}${QS7}" RESULTSTR1="FORMAT_PAGE_TARGET=Format_page_919098664&RESULTS_PAGE_TARGET=Blast_Results_for_919098664&RID=1130591558-739-35617443386.BLASTQ3" RESULTSTR2="&SHOW_OVERVIEW=on&SHOW_LINKOUT=on&GET_SEQUENCE=on&NCBI_GI=on&FORMAT_OBJECT=Alignment&FORMAT_TYPE=${format}&MASK_CHAR=0&MASK_COLOR=0" RESULTSTR3="&DESCRIPTIONS=100&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&I_THRESH=0.005&FORMAT_ENTREZ_QUERY=&FORMAT_ENTREZ_QUERY=All+organisms&EXPECT_LOW=" RESULTSTR4="&EXPECT_HIGH=&RID=1130591558-739-35617443386.BLASTQ3&RTOE=10&CLIENT=web&FORMAT_OBJECT=Alignment&CMD=Get&PAGE=Proteins&_PGR=0" RESULTSTR5="&PID=739&FORMAT_PAGE_TARGET=&RESULTS_PAGE_TARGET=&LAYOUT=TwoWindows&FORMAT_BLOCK_ON_RESPAGE=None&STEP_NUMBER=1&EXPECT=20000" RESULTSTR6="&HITLIST_SIZE=100&DESCRIPTIONS=100&ALIGNMENTS=50&AUTO_FORMAT=Semiauto" POSTURL="http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi" LYNXCMD="lynx -width=0 -source -accept_all_cookies -dump -post_data " echo $QUERYSTR | lynx -width=0 -source -accept_all_cookies -dump -post_data "${POSTURL}" >.temp_blast_0 VAR1=`cat .temp_blast_0 | grep RID | tail -n 1 | awk '{print $3}'` VAR2=`cat .temp_blast_0 | grep "_TARGET" | sed -n 's/.*Format.page.\([0-9]*\).*/\1/p'` echo $VAR1 echo $VAR2 R1="FORMAT_PAGE_TARGET=Format_page_${VAR2}&RESULTS_PAGE_TARGET=Blast_Results_for_${VAR2}&RID=${VAR1}" R2="&SHOW_OVERVIEW=on&SHOW_LINKOUT=on&GET_SEQUENCE=on&NCBI_GI=on&FORMAT_OBJECT=Alignment&FORMAT_TYPE=${format}&MASK_CHAR=0&MASK_COLOR=0" R3="&DESCRIPTIONS=1000&ALIGNMENTS=1000&ALIGNMENT_VIEW=Pairwise&I_THRESH=0.005&FORMAT_ENTREZ_QUERY=&FORMAT_ENTREZ_QUERY=All+organisms&EXPECT_LOW=" R4="&EXPECT_HIGH=&RID=${VAR1}&RTOE=10&CLIENT=web&FORMAT_OBJECT=Alignment&CMD=Get&PAGE=Proteins&_PGR=0" R5="&PID=739&FORMAT_PAGE_TARGET=&RESULTS_PAGE_TARGET=&LAYOUT=TwoWindows&FORMAT_BLOCK_ON_RESPAGE=None&STEP_NUMBER=1&EXPECT=20000" R6="&HITLIST_SIZE=1000&DESCRIPTIONS=1000&ALIGNMENTS=1000&AUTO_FORMAT=Semiauto" STATUS="WHERE_IS_POST_TEST" until [ "${STATUS}" == "" ] do sleep 3 echo ${R1}${R2}${R3}${R4}${R5}${R6}| $LYNXCMD "${POSTURL}" > .temp_blast_1 if [ "$?" -ne "0" ] then echo "Failed to get " echo "${R1}${R2}${R3}${R4}${R5}${R6}" fi STATUS=`cat .temp_blast_1 | grep "Status=WAITING" ` STATUSX=`cat .temp_blast_1 | grep "Status=" ` statusline "$SEQ $STATUSX" done ************************************************************************* Mike Marchywka EyeWonder Instant Streaming, Infinite Results 1447 Peachtree Street 9th Floor Atlanta, GA 30309 w.678-891-2033 c. h.770-565-8101 mmarchywka at eyewonder.com alt: marchywka at hotmail.com Instant Streaming, Intelligent results. ************************************************************************* -----Original Message----- From: bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformatics.org [mailto:bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformati cs.org]On Behalf Of Pankaj Sent: ThursdayMarch-23-2006 05:15 AM To: The general forum at Bioinformatics.Org Subject: Re: [BiO BB] How to find the same proteins? For this u can go to NCBI BLAST page and go to BLASTP. There u can paste ur sequence and select PDB as the database to query. Just click on BLAST and u get all seq similar to ur sequence. Filter out the results to find PDB ids >99% similar to ur protein. Sine u have 200 proteins u can download NCBI database and run local BLAST also. Cheers Pankaj Khurana Research Scholar National Institute of Immunology New Delhi India -- Open WebMail Project (http://openwebmail.org) ---------- Original Message ----------- From: Semen Esilevsky To: bio_bulletin_board at bioinformatics.org Sent: Thu, 23 Mar 2006 01:30:58 -0800 (PST) Subject: [BiO BB] How to find the same proteins? > Dear all, > I'm a novice in bioinformatics and this question is > probably stupid, but... > I have a list of ~200 PDB id's. For each of them I > have to build a list of all entries in PDB, which > represent the same protein (say, >99% sequence > similarity and no large gaps). Could someone suggest > me the least painfull way of doing this? > As far as I understand all what I need is the database > where all pairwice BLAST allignments of PDB chains are > stored. I've found one as a part of a PISCES server, > but it is incomplete and contains some internal > inconsistensies. Could someone suggest me a better one > or there is a simpler way out? > > Best, > Semen > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board ------- End of Original Message ------- _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From mmarchywka at eyewonder.com Thu Mar 23 17:46:40 2006 From: mmarchywka at eyewonder.com (Mike Marchywka) Date: Thu, 23 Mar 2006 17:46:40 -0500 Subject: [BiO BB] How to find the same proteins? Message-ID: <73CA026E5E77C74398C69F3338C5967C0750E0B4@atlexc01.atlanta.eyewonder.com> I'm sure they are sick of mailing me but they did answer again. FWIW, the downloaded utility seems to be pretty usable but the documentation suggests it has some firewall issues. Works fine for me so far. This should more directly answer your question and their perl script is cleaner than mine:) "Hello, Please use the URL API as an interface to the BLAST servers. See the following pages for more information: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/node_0.html http://www.ncbi.nlm.nih.gov/blast/docs/web_blast.pl Peter" ************************************************************************* Mike Marchywka EyeWonder Instant Streaming, Infinite Results 1447 Peachtree Street 9th Floor Atlanta, GA 30309 w.678-891-2033 c. h.770-565-8101 mmarchywka at eyewonder.com alt: marchywka at hotmail.com Instant Streaming, Intelligent results. ************************************************************************* From jstroud at mbi.ucla.edu Thu Mar 23 19:59:25 2006 From: jstroud at mbi.ucla.edu (James Stroud) Date: Thu, 23 Mar 2006 16:59:25 -0800 Subject: [BiO BB] [ANNOUNCE] make-bdna Server Message-ID: <200603231659.25861.jstroud@mbi.ucla.edu> Hello Everyone, Due to the absence of any similar resource that I could find, I have put an easy-to-use sever for making B-DNA on the web at: http://www.doe-mbi.ucla.edu/~jstroud/make-bdna/ The make-bdna server creates a pdb file of B-DNA according to an ascii representation. As a bonus, the server makes a CNS def file that specifies the WC base pairing, sugar pucker, and planarity of the new DNA. As I have time, I will increase the capabilites of the server in the future. Please let me know if you have any suggestions. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ From chea at mail.nih.gov Fri Mar 24 09:17:49 2006 From: chea at mail.nih.gov (Anney Che) Date: Fri, 24 Mar 2006 09:17:49 -0500 Subject: [BiO BB] Sequence analysis help Message-ID: Hi Everyone, I have a question about sequence analysis. I have a set of sequential HIV-1 sequences and I will like to find the primordial sequence of the sequence. Any advice will be greatly appreciated. Thanks, Anney Anney Che, M.S. Biocomputing Specialist Laboratory of Molecular Microbiology (LMM) National Institute of Allergy and Infectious Diseases (NIAID) 9000 Rockville Pike, Bldg 4, Room 301 Bethesda, MD 20892 Phone: 301-451-2851 Fax: 301-280-2716 From narcis at fiserlab.org Fri Mar 24 11:43:51 2006 From: narcis at fiserlab.org (Narcis Fernandez-Fuentes) Date: Fri, 24 Mar 2006 11:43:51 -0500 Subject: [BiO BB] atomic coordinates for small organic molecules Message-ID: <442421C7.5070308@fiserlab.org> Hi all, Does anybody knows a program/server to create atomic coordinates (in PDB format) for small organic molecules, like aliphatic chains, rings, etc? Something very similar to a previous post (bDNA server). Thanks From boris.steipe at utoronto.ca Fri Mar 24 12:25:10 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Fri, 24 Mar 2006 12:25:10 -0500 Subject: [BiO BB] Sequence analysis help In-Reply-To: References: Message-ID: <793816DC-2703-4B6D-9E89-06B50CB977E0@utoronto.ca> To reconstruct ancestral sequences, you could use the PAMP method of the PAML package, http://abacus.gene.ucl.ac.uk/software/paml.html HTH Boris On 24 Mar 2006, at 09:17, Anney Che wrote: > Hi Everyone, > > I have a question about sequence analysis. > > I have a set of sequential HIV-1 sequences and I will like to find the > primordial sequence of the sequence. > > Any advice will be greatly appreciated. > > Thanks, > > Anney > > > > > Anney Che, M.S. > Biocomputing Specialist > Laboratory of Molecular Microbiology (LMM) > National Institute of Allergy and Infectious Diseases (NIAID) > 9000 Rockville Pike, Bldg 4, Room 301 > Bethesda, MD 20892 > Phone: 301-451-2851 > Fax: 301-280-2716 > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From rb at hcl.in Sat Mar 25 05:09:20 2006 From: rb at hcl.in (Balamurugan.R) Date: Sat, 25 Mar 2006 15:39:20 +0530 Subject: [BiO BB] atomic coordinates for small organic molecules In-Reply-To: <442421C7.5070308@fiserlab.org> References: <442421C7.5070308@fiserlab.org> Message-ID: <442516D0.6070801@hcl.in> you can try prodrg server, davapc1.bioch.dundee.ac.uk/programs/*prodrg hope it helps, Bala * Narcis Fernandez-Fuentes wrote: > > > Hi all, > > Does anybody knows a program/server to create atomic coordinates (in > PDB format) for small organic molecules, like aliphatic chains, rings, > etc? Something very similar to a previous post (bDNA server). > > Thanks > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > DISCLAIMER: > ----------------------------------------------------------------------------------------------------------------------- > > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. It shall not attach any > liability on the originator or HCL or its affiliates. Any views or > opinions presented in this email are solely those of the author and > may not necessarily reflect the opinions of HCL or its affiliates. Any > form of reproduction, dissemination, copying, disclosure, > modification, distribution and / or publication of this message > without the prior written consent of the author of this e-mail is > strictly prohibited. If you have received this email in error please > delete it and notify the sender immediately. Before opening any mail > and attachments please check them for viruses and defect. > > ----------------------------------------------------------------------------------------------------------------------- > > > -- Best Regards, Balamurugan.R DISCLAIMER: ******************************************************************* This e-mail contains confidential and/or privileged information. If you are not the intended recipient(or have recieved this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, use or distribution of the material in this e-mail is strictly forbidden. ******************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: rb.vcf Type: text/x-vcard Size: 156 bytes Desc: not available URL: From yesint4 at yahoo.com Mon Mar 27 05:04:13 2006 From: yesint4 at yahoo.com (Semen Esilevsky) Date: Mon, 27 Mar 2006 02:04:13 -0800 (PST) Subject: [BiO BB] PISCES inconsistencies In-Reply-To: <4422AE31.7090308@mrc-dunn.cam.ac.uk> Message-ID: <20060327100413.49638.qmail@web36509.mail.mud.yahoo.com> > It would be useful if you could describe the > problems you are > having with PISCES, as that is a very popular and > commonly used database. Ok here is what I've found in the databases of the stand-alone PISCES package. Probably this information will be useful for somebody else. 1) The FASTA file pdbaa and the alignment file pdbaa.align are not consistent. There are (very few) entries in pdbaa.align, which are absent in pdbaa. 2) As far as I understand pdbaa.align should contain two entries for each pair: aaaa vs bbbb and bbbb vs aaaa. For few proteins only one of them is present. Many pairs are simply missed with no explanation (I suspect that I just misunderstand something at this point, however). 3) There are some formatting errors in pdbaa. Sincerely, Semen __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From golharam at umdnj.edu Mon Mar 27 11:50:42 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 27 Mar 2006 11:50:42 -0500 Subject: [BiO BB] Building an alignment from BLAST hsp Message-ID: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> I have a BLAST alignment: query sequence and database sequence. The alignment is only showing the HSP from the blast output as expected, however I want to build an alignment of the entire database sequence against my query sequence. I tried using needle from EMBOSS, however its aligning the sequences completely different than BLAST does. What I'd really like is a way to anchor the alignment based on the BLAST HSP. Does anyone know how to do this, or what tool(s) will allow me to do this? Ryan From gary at primary.bioinformatics.org Mon Mar 27 12:09:12 2006 From: gary at primary.bioinformatics.org (Gary Van Domselaar) Date: Mon, 27 Mar 2006 12:09:12 -0500 (EST) Subject: [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> References: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> Message-ID: Hi Ryan, The BLAST output will have the coordinates of the HSP for your query and database sequence. You can pretty easily write a script using bioperl's BLAST parsing modules: http://www.bioperl.org/wiki/HOWTO:SearchIO to grab the coordinates and sequences from the alignments and then use the Bio::SimpleAlign module: http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/SimpleAlign.html to remap the query on to the full database sequence. On Mon, 27 Mar 2006, Ryan Golhar wrote: > I have a BLAST alignment: query sequence and database sequence. > > The alignment is only showing the HSP from the blast output as expected, > however I want to build an alignment of the entire database sequence > against my query sequence. > > I tried using needle from EMBOSS, however its aligning the sequences > completely different than BLAST does. What I'd really like is a way to > anchor the alignment based on the BLAST HSP. Does anyone know how to do > this, or what tool(s) will allow me to do this? > > Ryan > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Gary Van Domselaar, PhD Associate Director, Bioinformatics.Org gary at bioinformatics.org From pmr at ebi.ac.uk Mon Mar 27 12:50:09 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 27 Mar 2006 18:50:09 +0100 (BST) Subject: [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> References: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> Message-ID: <2253.86.132.217.176.1143481809.squirrel@webmail.ebi.ac.uk> Ryan Golhar wrote: > I have a BLAST alignment: query sequence and database sequence. > > The alignment is only showing the HSP from the blast output as expected, > however I want to build an alignment of the entire database sequence > against my query sequence. > > I tried using needle from EMBOSS, however its aligning the sequences > completely different than BLAST does. What I'd really like is a way to > anchor the alignment based on the BLAST HSP. Does anyone know how to do > this, or what tool(s) will allow me to do this? You are quite right that EMBOSS may align the sequences completely differently - unless the HSPs are very significant and cover most of the sequence this will be true of any attempt to simply realign. There has to be some way to pass on the HSPs as fixed positions, as in the BioPerl solution. However, it could make a nice EMBOSS application - the only question would be how you would like to specify the HSPs. Perhaps we could read BLAST output (in some specified format), or perhaps some other way to give the input alignments. We do have at least one EMBOSS application that does something similar (finds all long perfect matches and interpolates) - we just need to reuse the interpolation code which is basically doing a global alignment of the bits in between. That also tackles the problem of choosing which non-compatible initial matches to use. Hope that helps, Peter From golharam at umdnj.edu Mon Mar 27 13:03:39 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 27 Mar 2006 13:03:39 -0500 Subject: [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <2253.86.132.217.176.1143481809.squirrel@webmail.ebi.ac.uk> Message-ID: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> Hi Peter, > You are quite right that EMBOSS may align the sequences completely > differently - unless the HSPs are very significant and cover most > of the sequence this will be true of any attempt to simply realign. > There has to be some way to pass on the HSPs as fixed positions, > as in the BioPerl solution. I looked at a bioperl method, but can't seem to find something that will accomplish this. > However, it could make a nice EMBOSS application - the only question > would be how you would like to specify the HSPs. Perhaps we could read > BLAST output (in some specified format), or perhaps some other way to > give the input alignments. Yes, I agree. I suppose the best way would be to specify the two sequences and the blast output. The application could then construct an alignment based on a particular HSP (probably the first one, or whatever the user specifies). Ryan From letondal at pasteur.fr Tue Mar 28 02:25:07 2006 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 28 Mar 2006 09:25:07 +0200 Subject: [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> References: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> Message-ID: <4b91818a096ba42d8d53279a7f63e6ea@pasteur.fr> On Mar 27, 2006, at 8:03 PM, Ryan Golhar wrote: > Hi Peter, > >> You are quite right that EMBOSS may align the sequences completely >> differently - unless the HSPs are very significant and cover most >> of the sequence this will be true of any attempt to simply realign. >> There has to be some way to pass on the HSPs as fixed positions, >> as in the BioPerl solution. > > I looked at a bioperl method, but can't seem to find something that > will > accomplish this. > >> However, it could make a nice EMBOSS application - the only question >> would be how you would like to specify the HSPs. Perhaps we could read > >> BLAST output (in some specified format), or perhaps some other way to >> give the input alignments. > > Yes, I agree. I suppose the best way would be to specify the two > sequences and the blast output. The application could then construct > an > alignment based on a particular HSP (probably the first one, or > whatever > the user specifies). > Have you tried this: http://bioweb.pasteur.fr/seqanal/interfaces/seqsblast.html It is based on bioperl. check "Get HSP" option (you can even extend it). Best, -- Catherine Letondal -- Institut Pasteur -- Computing Center From aws at sanger.ac.uk Tue Mar 28 08:09:30 2006 From: aws at sanger.ac.uk (Adam Spargo) Date: Tue, 28 Mar 2006 14:09:30 +0100 (BST) Subject: [BiO BB] TraceSearch Message-ID: Hi, We would like to announce the launch of a new free service which gives public access to the Wellcome Trust Sanger Institute Trace Archive via sequence similarity. The archive contains records of all publicly available DNA sequencing reads. The search engine, available at: http://trace.ensembl.org/cgi-bin/tracesearch allows users to identify any sequences in the archive with significant similarity to their query sequence. Users are able to search the whole archive in a few seconds, or alternatively to limit the search by species, sequencing centre or trace type. We use a version of the SSAHA algorithm to distribute an index over a cluster of machines so that we can continue to scale the service as the archive grows. Full Story: http://www.sanger.ac.uk/Info/Press/ We welcome any feedback and suggestions for improvements to this service. Please forward this email to collegues and collaborators who may be interested. Thanks, On behave of the TraceSearch development team. -- Dr Adam Spargo High Performance Assembly Group email: aws at sanger.ac.uk Wellcome Trust Sanger Institute Tel: +44 (0)1223 834244 x7728 Hinxton, Cambridge CB10 1SA Fax: +44 (0)1223 494919 From maximilianh at gmail.com Tue Mar 28 11:18:43 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Tue, 28 Mar 2006 18:18:43 +0200 Subject: [BiO BB] Translate Ensembl Transcript ID to NCBI GB or gi IDs? In-Reply-To: References: Message-ID: <76f031ae0603280818pea52690ya3ec2a333bdec1cf@mail.gmail.com> I don't know much about ensembl and their tables. but ucsc is linking its known genes to both ensembl and ncbi refseqs. This is part of the output of the table browser: ---- Database: hg17 Primary Table: knownGene Row Count: 39,368 Description: Protein coding genes based on proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their corresponding mRNAs from GenBank ----- hg17.knownToEnsembl.name (via knownGene.name) hg17.knownToLocusLink.name (via knownGene.name) hg17.knownToPfam.name (via knownGene.name) hg17.knownToRefSeq.name (via knownGene.name) ----- I guess you can connect to their mysql server and generate a table to translate your ids... Hope that helped, Max On 16/03/06, Amir Karger wrote: > I'm blasting a bunch of sequences against the Ensembl CDNA Human Genome, > which provides Ensemble Transcript IDs (e.g., ENST00000326632). Is there an > easy way to convert those to GenBank identifiers or GIs? BioMart offers to > give all kinds of IDs, but NCBI IDs don't seem to be in the list. And I was > unable to find translation files in some digging at the NCBI and Ensembl web > sites. > > I'm happy to do it with a script, if there's a conversion file available. An > online resource is OK, too, of course. Is the information found in the > Ensembl MySQL stuff somewhere, such that I could get it with the Perl API? > > Thanks, > > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > 617-496-0626 > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Maximilian Haeussler, CNRS Gif-sur-Yvette, Paris tel: +33 6 12 82 76 16 icq: 3825815 -- msn: maximilian.haeussler at hpi.uni-potsdam.de skype: maximilianhaeussler From christoph.gille at charite.de Wed Mar 29 07:23:21 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Wed, 29 Mar 2006 14:23:21 +0200 (CEST) Subject: [BiO BB] organism with conserved intron/exon structure ? Message-ID: <51412.141.42.56.114.1143635001.squirrel@webmail.charite.de> We want to perform a study on genome structure and need a genome where intron/exon boundaries did not change much during long time of evolution. What organism would you suggest ? Many thanks From pankaj at nii.res.in Thu Mar 30 00:18:25 2006 From: pankaj at nii.res.in (Pankaj) Date: Thu, 30 Mar 2006 10:48:25 +0530 Subject: [BiO BB] models of sugars Message-ID: <20060330051825.M19861@nii.res.in> Hi all, I want to dock a few carbohydrate monosaccharides to their respective proteins, like L-vancosmaine or D-glucose or D-mannose. I have two questions: 1) I have no idea about the differences between the D- and L-isomer. I have read about it on net. I understand fischer projection etc on the 2-D plane but how excatly do stereoisomers of sugars in pyranose form differ in 3-D structure. Having the sugar in a pyranose ring then how do I model (ie a 3-D model of D-mannose or L-mannose etc) its structure? 2) Is there a databse from where I can download 3-D structure of these sugars. Thanking all in advance Pankaj Khurana -- Open WebMail Project (http://openwebmail.org) From idoerg at burnham.org Thu Mar 30 02:11:52 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 29 Mar 2006 23:11:52 -0800 Subject: [BiO BB] Second call for participation: Automated Function Prediction 2006 Message-ID: (Please forward as appropriate, thanks). Call for Participation: Talks, Papers and Posters The Second Automated Function Prediction Meeting August 30 -- September 1 2006, University of California San Diego Abstract submission deadline: April 26, 2006 http://BioFunctionPrediction.org/AFP/afp06 Sequence and structure genomics have generated a wealth of data, but extracting meaningful information from genomic information is becoming an increasingly difficult challenge. Both the number and the diversity of discovered genes is increasing. This increase means that established annotation methods, such as homology transfer, are annotating less data. In addition, there is a need for annotation which is standardized so that it could be incorporated into function annotation on a large scale. Finally, there is a need to assess the quality of the function prediction software which is out there. We probably know the sequence of the target for next generation antibiotics or cancer treatment. We just do not realize that because the target is currently annotated as a "domain of unknown function". For these reasons and many more, automated protein function prediction is rapidly gaining interest among computational biologists in academia and industry. The first Automated Function Prediction (AFP) meeting was held alongside ISMB 2005, and gathered together some 100 attendees for a full day of talks, poster sessions, and a discussion panel. The second meeting will be a three day event, August 30-September 1st , 2006 at the campus of University of California, San Diego, California, USA. AFP 2006 will feature: * Plenary talks delivered by leading researchers in the field * Submitted talks * Conference proceedings published as research papers in BMC Bioinformatics * A special discussion panel on gene and protein annotation * A poster session Speakers: * Philip E. Bourne, University of California, San Diego, USA * Steven E. Brenner, University of California, Berkeley, USA * Terry Gaasterland, Scripps Institute of Oceanography, La Jolla, USA * Adam Godzik, Burnham Institute for Medical Research and University of California, San Diego USA * Christos Ouzounis European Bioinformatics Institute, Cambridge, UK * Anna Tramontano, University of Rome, "La Sapienza", Rome, Italy * Shoshana Wodak, Hospital for Sick Children, and Departments of Biochemistry and Medical Genetics, University of Toronto, Canada. Talks and posters are sought in, but not limited to, the following topics: * Function prediction using sequence based methods. This would include "classic" methods such as detection of functional motifs and inferring function from sequence similarity. * Function from genomic information: prediction by genomic location; locus comparison with other organisms; function gain and loss. * Function prediction in metagenomics * Phylogeny based methods * Function from molecular interactions * Function from structure * Function prediction using combined methods * "Meta-talks" discussing the limitations and horizons of computational function prediction. * Assessing function prediction programs Authors of abstracts selected for talks would also have the opportunity to extend them to full length papers, which will be reviewed for publication in BMC Bioinformatics in a special AFP proceedings section. BMC Bioinformatics is an Open Access, peer-reviewed journal that considers articles on all aspects of computational methods used in the analysis and annotation of sequences and structures, as well as all other areas of computational biology. It has an ISI impact factor of 5.42 for year 2004. Abstract submission deadline: April 26, 2006 For more information please see the meeting site: http://BioFunctionPrediction.org/AFP/afp06 Sincerely, Iddo Friedberg, in the name of the AFP 2006 organizing committee -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9949 http://iddo-friedberg.org http://BioFunctionPrediction.org From jeff at bioinformatics.org Thu Mar 30 10:28:24 2006 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Thu, 30 Mar 2006 10:28:24 -0500 Subject: [BiO BB] Reminder about Franklin Award ceremony Message-ID: <442BF918.3070901@bioinformatics.org> In conjunction with Bioinformatics.Org, Life Sciences (formerly Bio-IT World) Conference + Expo will once again host the annual Benjamin Franklin Award, a humanitarian award presented for the promotion of Open Access in the life sciences. Please join us on Wednesday, April 5th at 9:30AM as we congratulate Michael Ashburner, noted Drosophila geneticist from the University of Cambridge, and this year's winner! The ceremony is open to all attendees. Register today for this world-class program that also includes conference tracks, keynotes from industry leaders and educational workshops taking place next week at the Sheraton Boston Hotel. Use priority code BTR241 to save 25%. Please use the following URL to register: https://register.rcsreg.com/regos-1.0/lifesciences2006/ga/?pri=BTR241 There's also a free, "Exhibits Only" pass that will give you access to the exhibits, keynotes and feature presentations (including the Franklin Award), provided you register for it before April 3. And thanks to the folks at IDG/Bio-IT World for providing this forum to Bioinformatics.Org for the last 5 years. See you there! Jeff -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- From vivekphilip at gmail.com Thu Mar 30 10:57:25 2006 From: vivekphilip at gmail.com (Vivek Philip) Date: Thu, 30 Mar 2006 09:57:25 -0600 Subject: [BiO BB] Ramachandran plot. Message-ID: <5a4d88fc0603300757x4b7940e7ha9635845ac6f5cc5@mail.gmail.com> Hi, While analyzing the plot of phi and psi angles on the Ramachandran plot, we found out that there was a lot of overlap between the alpha-helix and beta-sheet regions. By this we mean that just based on the phi and psi angles alpha helices are meant to be both negative, while for beta sheets phi angles take negative values and psi positive. The results we used for generating these plots were obtained from STRIDE and DSSP for top 500 high resolution pdb structures. Does anyone have an idea as to what the reason for such an overlap could be? Moreover on further analysis we did see a lot of misclassification particularly related to the pi-helix region (Only about 38% of the residues were classified correctly).We are talking about approximately 40% overlap between the two regions. Any feedback on this would be greatly appreciated. Thank you, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From boris.steipe at utoronto.ca Thu Mar 30 13:14:55 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 30 Mar 2006 13:14:55 -0500 Subject: [BiO BB] Ramachandran plot. In-Reply-To: <5a4d88fc0603300757x4b7940e7ha9635845ac6f5cc5@mail.gmail.com> References: <5a4d88fc0603300757x4b7940e7ha9635845ac6f5cc5@mail.gmail.com> Message-ID: <22F3197E-168F-4BD5-B344-7417D85FAC21@utoronto.ca> This is not misclassification: helices and strands have >>repeating<< residues with certain (phi,psi) values. The Ramachandran plot shows you a value for >>single<< residues. To classify a structural segment as belonging to a certain type you need more information than individual (phi,psi) values. HTH, Boris On 30 Mar 2006, at 10:57, Vivek Philip wrote: > Hi, > While analyzing the plot of phi and psi angles on the > Ramachandran plot, we found out that there was a lot of overlap > between the alpha-helix and beta-sheet regions. By this we mean > that just based on the phi and psi angles alpha helices are meant > to be both negative, while for beta sheets phi angles take negative > values and psi positive. The results we used for generating these > plots were obtained from STRIDE and DSSP for top 500 high > resolution pdb structures. > Does anyone have an idea as to what the reason for such an > overlap could be? Moreover on further analysis we did see a lot of > misclassification particularly related to the pi-helix region (Only > about 38% of the residues were classified correctly).We are talking > about approximately 40% overlap between the two regions. > Any feedback on this would be greatly appreciated. > Thank you, > Vivek > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From vivekphilip at gmail.com Thu Mar 30 13:30:42 2006 From: vivekphilip at gmail.com (Vivek Philip) Date: Thu, 30 Mar 2006 12:30:42 -0600 Subject: [BiO BB] Ramachandran plot. In-Reply-To: <22F3197E-168F-4BD5-B344-7417D85FAC21@utoronto.ca> References: <5a4d88fc0603300757x4b7940e7ha9635845ac6f5cc5@mail.gmail.com> <22F3197E-168F-4BD5-B344-7417D85FAC21@utoronto.ca> Message-ID: <5a4d88fc0603301030t687b2fes12d00a9e4e01d71c@mail.gmail.com> But isn't it true that the ramachandarn plot is a plot of phi psi angles per residue and that should agree with the properties of alpha helices and beta sheet angles. That was our understanding of the problem.and that's why we thought it was interesting. Vivek On 3/30/06, Boris Steipe wrote: > > This is not misclassification: helices and strands have >>repeating<< > residues with certain (phi,psi) values. The Ramachandran plot shows > you a value for >>single<< residues. To classify a structural segment > as belonging to a certain type you need more information than > individual (phi,psi) values. > > HTH, > Boris > > > > On 30 Mar 2006, at 10:57, Vivek Philip wrote: > > > Hi, > > While analyzing the plot of phi and psi angles on the > > Ramachandran plot, we found out that there was a lot of overlap > > between the alpha-helix and beta-sheet regions. By this we mean > > that just based on the phi and psi angles alpha helices are meant > > to be both negative, while for beta sheets phi angles take negative > > values and psi positive. The results we used for generating these > > plots were obtained from STRIDE and DSSP for top 500 high > > resolution pdb structures. > > Does anyone have an idea as to what the reason for such an > > overlap could be? Moreover on further analysis we did see a lot of > > misclassification particularly related to the pi-helix region (Only > > about 38% of the residues were classified correctly).We are talking > > about approximately 40% overlap between the two regions. > > Any feedback on this would be greatly appreciated. > > Thank you, > > Vivek > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Vivek M. Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: From John.McNaught at manchester.ac.uk Fri Mar 31 13:16:32 2006 From: John.McNaught at manchester.ac.uk (John McNaught) Date: Fri, 31 Mar 2006 19:16:32 +0100 (BST) Subject: [BiO BB] New course: MSc in Text Mining at UManchester, UK Message-ID: <20060331181623.4427D94003@primary.bioinformatics.org> (Although not focussing exclusively on biotext mining, this course may be of interest to Bioinformatics students) Masters in Text Mining School of Informatics University of Manchester, UK Text mining is concerned with finding previously unsuspected knowledge through large-scale processing of unstructured text. It involves identifying relevant information (information retrieval), extracting facts of interest to the user from the identified texts (information extraction) and discovery of associations among the facts extracted from many different texts (data mining). Text mining finds application in many areas: competitive intelligence for business, hypothesis generation for scientists, predictive toxicology, patent searching, provision of metadata for digital libraries to enable conceptual search, sentiment analysis, database curation, fraud detection, disaster planning and defence against terrorism, to mention a few. It is an exciting growth area that supports scientists and knowledge workers in academia, business and government. It is interdisciplinary, as it leverages techniques from different fields, and it serves very practical needs in many domains. There is currently a lack of people with advanced training in text mining. This programme helps you to develop expertise in the methodologies and technologies for developing text mining software. The programme focuses upon natural language processing, data mining and information retrieval approaches. An additional course unit on industrial applications of text mining helps you to bridge the gap between academic knowledge and the deployment of that knowledge in organisations. The course unit introduces you to a wide variety of external speakers and real case-studies, and encourages you to develop report-writing and presentational skills to analyse cutting-edge text mining technology issues across the public and private sectors. The course runs from early October to mid-September, with teaching taking place in over 2 semesters followed by research on a dissertation topic for around 3 months over the summer period. Careers include specialists in knowledge and information management, in use of IT in archives, libraries, and knowledge analyst to support researchers in a wide variety of disciplines. This MSc also leads directly into PhD level research in the area. The University of Manchester hosts the National Centre for Text Mining (funded by the JISC, BBSRC and EPSRC), the first such publicly-funded centre in the world. Academic members of the Centre (www.nactem.ac.uk) will be closely involved in the teaching of this course, thus you will benefit from both theoretical and practical experience and from exposure to robust, efficient, scalable text mining tools. Entry requirements: Computing-related first degree. Degree class of 2i (or overseas equivalent). Applicants are required to provide evidence of ability in both spoken and written English, and one of the following minimum qualifications should be held: GCSE English Language (Grade C or higher), TOEFL>570/230 or IELTS>6.5. Course Units: Data Mining Information Management Information Retrieval Knowledge Representation and Semantic Web Natural Language Processing Research and Professional Development Text Mining Applications and Systems Contact: The MSc Admissions Office School of Informatics The University of Manchester PO Box 88, Manchester M60 1QD Telephone: +44 (0)161 306 1299 Email: pg-informatics at manchester.ac.uk Apply: http://www.manchester.ac.uk/postgraduate/howtoapply/ Further information: http://www.informatics.manchester.ac.uk/programmes/pg_programme_list.php Brochure: http://www.informatics.manchester.ac.uk/programmes/InformaticsPGbrochure.pdf -- John McNaught Associate Director National Centre for Text Mining and School of Informatics University of Manchester mail: John.McNaught at manchester.ac.uk PO Box 88 Sackville Street tel: +44.161.306.3098 Manchester fax: +44.161.306.1281 M60 1QD web: www.nactem.ac.uk UK www.informatics.manchester.ac.uk