From bioinfosm at gmail.com Tue Jan 3 17:08:10 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Tue, 3 Jan 2006 17:08:10 -0500 Subject: [BiO BB] Parallel MEME (motif finding) Message-ID: <726450810601031408ra2de2d8kd65020dbf457b844@mail.gmail.com> Hi, Can anyone point me to a place where I can submit large MEME jobs on parallel. They are too intensive to be run serially. Thanks, ~S -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinfosm at gmail.com Thu Jan 5 13:08:16 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Thu, 5 Jan 2006 13:08:16 -0500 Subject: [BiO BB] Parallel MEME (motif finding) In-Reply-To: <20060103192450.d390e8b22a051e7fc890dba1b5019077.046d867a4e.wbe@email.email.secureserver.net> References: <20060103192450.d390e8b22a051e7fc890dba1b5019077.046d867a4e.wbe@email.email.secureserver.net> Message-ID: <726450810601051008o20cbd39ag1a4a11f1795f33b8@mail.gmail.com> Willy, Thanks for the response. The dataset is quite big (arnd 10^6 bp). That is the reason I am not running it serially. I have not tried Eise's program. ~S On 1/3/06, Willy Valdivia-Granda wrote: > > Samantha > > Have you tried the program of Mike Eisen to look motifs? One problem > with meme is that even when you install and make it run ith MPI it you > get a high number of false positives. I haven't tried the Eisen > program. How big is the dataset? > > Best luck > > Willy Valdivia > > > > -------- Original Message -------- > > Subject: [BiO BB] Parallel MEME (motif finding) > > From: Samantha Fox <> > > Date: Tue, January 03, 2006 5:08 pm > > To: bio_bulletin_board at bioinformatics.org > > > > Hi, > > Can anyone point me to a place where I can submit large MEME jobs on > parallel. They are too intensive to be run serially. > > > > Thanks, > > ~S > > > > --------------------------------------------------------------------- > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinfosm at gmail.com Thu Jan 5 13:10:34 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Thu, 5 Jan 2006 13:10:34 -0500 Subject: [BiO BB] GO nodes Message-ID: <726450810601051010l735804a9me41beb3c646ec17d@mail.gmail.com> Hi, I am looking to find a list of GO terms at a particular level in the DAG. Does anyone know a good way to do so .. say i want all GO nodes at level 4 of the gene-ontology dag. ~S -------------- next part -------------- An HTML attachment was scrubbed... URL: From hz5 at njit.edu Fri Jan 6 00:41:31 2006 From: hz5 at njit.edu (hz5 at njit.edu) Date: Fri, 06 Jan 2006 00:41:31 -0500 (EST) Subject: [BiO BB] Automated upstream region sequence retrieval In-Reply-To: <43560ECA.7020500@tuks.co.za> References: <43560ECA.7020500@tuks.co.za> Message-ID: <1136526091.43be030b2bfa3@webmail.njit.edu> EZRetrieve Quoting Charles Hefer : > Hi > > I am looking for a way to automate the retrieval of upstream regions > of genes (from fully sequenced genomes). > > I have tried the R/BioMart route, but the organisms I want are not > available in BioMart (yet). The next option would be to use BLAT to > retrieve the gene positions and then retrieve the upstream regions, > but > I am looking for a simpler solution. Does one of the Bio modules of i.e > > Python/Java/PERL support this functionality? > > The aim is to put up a little internal web-service for promoter > searches, for which the desired gene ID (GenBankId) would be entered and > > the ~2kb upstream region returned. > > Thanx, in advance > -- > > Charles > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > ========================================================= Haibo Zhang, PhD Computational Biology http://www.cyberpostdoc.org/ Share postdoc information in cyberspace. Welcome your stories, suggestions and advice! From sona_ghn at yahoo.co.in Fri Jan 6 06:35:28 2006 From: sona_ghn at yahoo.co.in (gehana vaswani) Date: Fri, 6 Jan 2006 11:35:28 +0000 (GMT) Subject: [BiO BB] advice on joining projects Message-ID: <20060106113528.40463.qmail@web8505.mail.in.yahoo.com> hi, i am in the second year b.tech bioinformatics n want to know that how can i participate in doing some projects with experienced people.whom should i approach for this and how? kindly answer.bye Send instant messages to your online friends http://in.messenger.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From subhash_pgi at yahoo.co.in Fri Jan 6 07:27:41 2006 From: subhash_pgi at yahoo.co.in (subhash choudhary) Date: Fri, 6 Jan 2006 12:27:41 +0000 (GMT) Subject: [BiO BB] advice on joining projects In-Reply-To: <20060106113528.40463.qmail@web8505.mail.in.yahoo.com> Message-ID: <20060106122741.49090.qmail@web8406.mail.in.yahoo.com> Hi gehana, Pls be in contat, probabay I can help you out Pls let me have your CV Subhash gehana vaswani wrote: hi, i am in the second year b.tech bioinformatics n want to know that how can i participate in doing some projects with experienced people.whom should i approach for this and how? kindly answer.bye Send instant messages to your online friends http://in.messenger.yahoo.com _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board Subhash Choudhary Bioinformatics Centre University Of Pune Pune-411007 Maharashtra India Tel: +91-20-25690195 (O) +91-9850229436 (M) Send instant messages to your online friends http://in.messenger.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmr at ebi.ac.uk Fri Jan 6 07:42:36 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 06 Jan 2006 12:42:36 +0000 Subject: [BiO BB] advice on joining projects In-Reply-To: <20060106113528.40463.qmail@web8505.mail.in.yahoo.com> References: <20060106113528.40463.qmail@web8505.mail.in.yahoo.com> Message-ID: <43BE65BC.6020206@ebi.ac.uk> gehana vaswani wrote: > hi, > i am in the second year b.tech bioinformatics n want to know that how can i participate in doing some projects with experienced people.whom should i approach for this and how? > kindly answer.bye You can join one of the many open source bioinformatics projects. Choose something close to your own interests, download the software, try making some suggestions to the developers (preferably via their mailing lists or bulletin boards so you join in a wider discussion). When you are ready, you can try making your own contributions (depending on the time and resources you have). There are software projects, databases, standards initiatives and ontologies that would welcome contributions. Good places to start looking are www.sourceforge.net www.open-bio.org and bioinformatics.org Hope that helps, Peter Rice From reddy_vram at yahoo.com Fri Jan 6 08:59:33 2006 From: reddy_vram at yahoo.com (Rami Reddy.V) Date: Fri, 6 Jan 2006 05:59:33 -0800 (PST) Subject: [BiO BB] flybase IDs to gene bank accession nos Message-ID: <20060106135933.6019.qmail@web54302.mail.yahoo.com> Hello all, I am looking for a tool to convert FlyBase IDs to Gene Bank accession or entrez IDs, if anybody knows please let me know. Thanks alot --------------------------------- Yahoo! Photos Ring in the New Year with Photo Calendars. Add photos, events, holidays, whatever. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pculpep at hotmail.com Fri Jan 6 16:58:51 2006 From: pculpep at hotmail.com (Pamela Culpepper) Date: Fri, 06 Jan 2006 21:58:51 +0000 Subject: [BiO BB] Local BLAST Graphical View Message-ID: We have just updated our website --http://www.lifeformulae.com with the files required to modify the latest version of the NCBI Toolbox's blastall to support the local BLAST alignment graphical view. The bastall_diff.txt file is the patch file that contains the modifications to the Toolbox code. The HOW_TO.txt file provides instructions on how to apply the patch and compile the Toolbox. blastall will produce the graphical view with either the old or the new BLAST engine. You must use the blastall "-T T" (HTML option) to activate the graphical overview. This software is provided freely to all Bio-Bulletin-Board users. Thanks, Pam Culpepper From willy_valdivia at orionbiosciences.com Thu Jan 5 13:18:51 2006 From: willy_valdivia at orionbiosciences.com (Willy Valdivia-Granda) Date: Thu, 05 Jan 2006 11:18:51 -0700 Subject: [BiO BB] GO nodes Message-ID: <20060105111851.d390e8b22a051e7fc890dba1b5019077.cd9110ead4.wbe@email.email.secureserver.net> Try CytoScape and then install the BINGO pluging and select the organims gene ID list and the level for which you would like to see the DAG (either cellular location, molecular function, and cellular component). You can also apply statst to filter out terms that are not statistically close to the graph. Just be careful in using the ontologies, several researchers use incriscriminately to "annotate" stuff... Other tool could be http://gostat.wehi.edu.au/ An course there is a DAGedit from GO webiste. Cheers > -------- Original Message -------- > Subject: [BiO BB] GO nodes > From: Samantha Fox > Date: Thu, January 05, 2006 1:10 pm > To: bio_bulletin_board at bioinformatics.org > > Hi, > I am looking to find a list of GO terms at a particular level in the DAG. > Does anyone know a good way to do so .. say i want all GO nodes at level 4 of the gene-ontology dag. > > ~S > > --------------------------------------------------------------------- > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From n_dilip531 at hotmail.com Fri Jan 6 23:38:45 2006 From: n_dilip531 at hotmail.com (dilip namboodiri) Date: Sat, 07 Jan 2006 10:08:45 +0530 Subject: [BiO BB] advice on joining projects Message-ID: Hi Subhash, I read the mail of the response to the gehana,I am DILIP. I had completed Msc. Bioinformatics from Madras university (Guru Nanak college). I am also interested about the subject which gehana, had mentioned. So in the reply of that message u mentioned about the CV which i am attaching. Please check over it. With Regards Dilip. >From: subhash choudhary >Reply-To: "The general forum at Bioinformatics.Org" > >To: "The general forum at Bioinformatics.Org" > >Subject: Re: [BiO BB] advice on joining projects >Date: Fri, 6 Jan 2006 12:27:41 +0000 (GMT) > >Hi gehana, > Pls be in contat, probabay I can help you out > Pls let me have your CV > > Subhash > > >gehana vaswani wrote: hi, > i am in the second year b.tech bioinformatics n want to know that how >can i participate in doing some projects with experienced people.whom >should i approach for this and how? > kindly answer.bye >Send instant messages to your online friends http://in.messenger.yahoo.com >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > >Subhash Choudhary >Bioinformatics Centre >University Of Pune >Pune-411007 >Maharashtra >India >Tel: +91-20-25690195 (O) > +91-9850229436 (M) > >Send instant messages to your online friends http://in.messenger.yahoo.com >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ NRIs, paying for Money Transfers to India? Use Money2India. It?s FREE http://creative.mediaturf.net/creatives/msn_product.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: RESUME.doc Type: application/msword Size: 57856 bytes Desc: not available URL: From liangyou at gmail.com Sun Jan 8 21:35:14 2006 From: liangyou at gmail.com (Liangyou Chen) Date: Sun, 8 Jan 2006 21:35:14 -0500 Subject: [BiO BB] flybase IDs to gene bank accession nos In-Reply-To: <20060106135933.6019.qmail@web54302.mail.yahoo.com> References: <20060106135933.6019.qmail@web54302.mail.yahoo.com> Message-ID: I am creating tools for such conversions. But I don't know how to connect the FlyBase ID to GeneBank accession or entrez ID. If you can show me an example how to do the conversion in detail, I will make a tool for you. Thanks. On 1/6/06, Rami Reddy.V wrote: > Hello all, > I am looking for a tool to convert FlyBase IDs to Gene Bank accession or > entrez IDs, if anybody knows please let me know. > Thanks alot > > > > ________________________________ > Yahoo! Photos > Ring in the New Year with Photo Calendars. Add photos, events, holidays, > whatever. > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > From aloraine at gmail.com Mon Jan 9 00:04:32 2006 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 8 Jan 2006 23:04:32 -0600 Subject: [BiO BB] flybase IDs to gene bank accession nos In-Reply-To: References: <20060106135933.6019.qmail@web54302.mail.yahoo.com> Message-ID: <83722dde0601082104y43dd4987kf9e70e53fb9607bf@mail.gmail.com> Hello, Maybe this will help... Does FlyBase provide sequence data? If you can get a mapping of FlyBase ids onto mRNA sequences, then you can use blast to search against a database of RefSeq fruit fly mRNA sequences. Then it should be relatively easy to get Entrez gene ids from the perfect-match RefSeq ids identified in the blast search. -Ann On 1/8/06, Liangyou Chen wrote: > I am creating tools for such conversions. But I don't know how to > connect the FlyBase ID to GeneBank accession or entrez ID. If you > can show me an example how to do the conversion in detail, I will make a tool > for you. > > Thanks. > > On 1/6/06, Rami Reddy.V wrote: > > Hello all, > > I am looking for a tool to convert FlyBase IDs to Gene Bank accession or > > entrez IDs, if anybody knows please let me know. > > Thanks alot > > > > > > > > ________________________________ > > Yahoo! Photos > > Ring in the New Year with Photo Calendars. Add photos, events, holidays, > > whatever. > > > > > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From reddy_vram at yahoo.com Mon Jan 9 04:14:11 2006 From: reddy_vram at yahoo.com (Rami Reddy.V) Date: Mon, 9 Jan 2006 01:14:11 -0800 (PST) Subject: [BiO BB] flybase IDs to gene bank accession nos In-Reply-To: <83722dde0601082104y43dd4987kf9e70e53fb9607bf@mail.gmail.com> Message-ID: <20060109091411.92574.qmail@web54304.mail.yahoo.com> Thanks to Ann and Chen for reply. Infact FlyBase itself is providing the batch reports, i came to know this lately. I was able to get Genebank accessions from FlyBase (http://flybase.net/cgi-bin/fbidbatch.html). Now i have one more question, i want to search for annotations using these IDs agianst DAVID (http://david.abcc.ncifcrf.gov/). I got all the accession numbers corresponding to my FlyBase gene IDs, do i have to check all these accessions numbers against DAVID or only accession number per one gene is enough, i am not sure does it make any sense. I used GOToolBox (http://139.124.62.227/GOToolBox/index.php?page=dataset) for annotations as it will accept FlyBase IDs but i want to check DAVID also for the same. Thanks alot for your time Ann Loraine wrote: Hello, Maybe this will help... Does FlyBase provide sequence data? If you can get a mapping of FlyBase ids onto mRNA sequences, then you can use blast to search against a database of RefSeq fruit fly mRNA sequences. Then it should be relatively easy to get Entrez gene ids from the perfect-match RefSeq ids identified in the blast search. -Ann On 1/8/06, Liangyou Chen wrote: > I am creating tools for such conversions. But I don't know how to > connect the FlyBase ID to GeneBank accession or entrez ID. If you > can show me an example how to do the conversion in detail, I will make a tool > for you. > > Thanks. > > On 1/6/06, Rami Reddy.V wrote: > > Hello all, > > I am looking for a tool to convert FlyBase IDs to Gene Bank accession or > > entrez IDs, if anybody knows please let me know. > > Thanks alot --------------------------------- Yahoo! Photos Ring in the New Year with Photo Calendars. Add photos, events, holidays, whatever. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miraceti at chol.com Tue Jan 10 00:41:22 2006 From: miraceti at chol.com (miraceti) Date: Tue, 10 Jan 2006 14:41:22 +0900 (KST) Subject: [BiO BB] need advise on annotation tool Message-ID: <20060110144122.201C2769@chol.com> Hello, I am about to start a genome annotation project on a bacterial species. I am thinking of using an automate annotation tool either genequiz or genDB. Which would you recommend, or are there better options? Thank you. Mira Han ???? ?? ??? ?? ?? ????? CHOL(?????)http://www.CHOL.com???? ???? ??? ???? ???? ???? ??? ???? ???? ???? ??? ?????? ???, ???, ?? ??????? ? ???!! ??? ??62%????? ???? ??! ???? ??! ???? ??! -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinfosm at gmail.com Fri Jan 13 12:37:18 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Fri, 13 Jan 2006 12:37:18 -0500 Subject: [BiO BB] Research on drosophila species Message-ID: <726450810601130937x253a039eveaa7b9430dc6ee4c@mail.gmail.com> Hi, There has been a big buzz on the 12 fly genomes ... and all that can be acheived from analyzing those. Can anyone point me to all the latest info. on whats what and who is doind or planning to do what research. I dont think any papers have been published, but people are providing resources to other co-workers. ... thanks ... Samantha -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinfosm at gmail.com Fri Jan 13 14:03:02 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Fri, 13 Jan 2006 14:03:02 -0500 Subject: [BiO BB] Automated upstream region sequence retrieval In-Reply-To: <1136526091.43be030b2bfa3@webmail.njit.edu> References: <43560ECA.7020500@tuks.co.za> <1136526091.43be030b2bfa3@webmail.njit.edu> Message-ID: <726450810601131103m6e1e62a6hbb9ff4862cb2a2f5@mail.gmail.com> EZRetrieve is just for Humans .. I dont think that solves Charles' problem.. Charles .. can you tell me some more about the R/BioMart route .. n am unaware of the BLAT procedure as well ... Samantha On 1/6/06, hz5 at njit.edu wrote: > > EZRetrieve > > Quoting Charles Hefer : > > > Hi > > > > I am looking for a way to automate the retrieval of upstream regions > > of genes (from fully sequenced genomes). > > > > I have tried the R/BioMart route, but the organisms I want are not > > available in BioMart (yet). The next option would be to use BLAT to > > retrieve the gene positions and then retrieve the upstream regions, > > but > > I am looking for a simpler solution. Does one of the Bio modules of i.e > > > > Python/Java/PERL support this functionality? > > > > The aim is to put up a little internal web-service for promoter > > searches, for which the desired gene ID (GenBankId) would be entered and > > > > the ~2kb upstream region returned. > > > > Thanx, in advance > > -- > > > > Charles -------------- next part -------------- An HTML attachment was scrubbed... URL: From blanchem at mcb.mcgill.ca Sun Jan 15 22:23:08 2006 From: blanchem at mcb.mcgill.ca (Mathieu Blanchette) Date: Sun, 15 Jan 2006 22:23:08 -0500 Subject: [BiO BB] Deadline extension for Phylogenomics Conference Message-ID: <6A79F39F-863F-11DA-968C-000D93C109C4@mcb.mcgill.ca> DEADLINE EXTENSION for the First International Conference on Phylogenomics New deadline for early registration: February 1st 2006. New deadline for abstract submission for presentation: February 1st 2006 ********* Conference announcement ****************** First International Conference on Phylogenomics Dates: March 16-19 2006 Location: Sainte-Ad?le (near Montreal), Qu?bec, Canada Web site: https://phylogenomics.bioinfo.umontreal.ca/meeting/ Organizers: Herv? Philippe and Mathieu Blanchette Scope: This conference aims to reunite experts focusing on two distinct aspects of phylogenomics: the use of genome data inferring species phylogeny and the use of phylogenetic approaches to gain insights into gene functions. The methods developed for phylogenetic inference (especially the models of sequence evolution) are quite advanced and could benefit to function prediction. Similarly, the knowledge of the accurate species phylogeny increases the quantity of functional information that can be extracted. Conversely, knowledge of gene function and the other selective constraints is primordial to improve tree reconstruction methods. This conference will create synergy between these two phylogenomic communities, bridging the gap between there respective scientific endeavors. A special issue of BMC Evolutionary Biology will be dedicated to the conference, allowing contributors of the conference to submit their manuscripts. Invited speakers: * Ford Doolittle Dalhousie University, Canada * Jonathan Eisen, The Institute for Genomic Research, USA * Brian Golding, McMaster University, Canada * Nick Goldman, EMBL-EBI Cambridge, UK * Richard Goldstein, National Institute for Medical Research, USA * Jotun Hein, University of Oxford, UK * Mark Pagel, University of Reading, UK * Eduardo Rocha, Universit? Paris 6, France * Andrew Roger, Dalhousie University, Canada * Michael Sanderson, University of California, USA * Adam Siepel, Cornell University, USA * Yves van de Peer, Ghent University, Belgium Important dates: Deadline for early registration: February 1st 2006. Deadline for abstract submission for presentation: February 1st 2006 Deadline for manuscript submission (accepted abstracts only): March 1st 2006 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2497 bytes Desc: not available URL: From ngadewal at yahoo.com Tue Jan 17 02:43:26 2006 From: ngadewal at yahoo.com (nikhil gadewal) Date: Mon, 16 Jan 2006 23:43:26 -0800 (PST) Subject: [BiO BB] Co-expression Message-ID: <20060117074326.64481.qmail@web51512.mail.yahoo.com> Hello Board Members, I have set of 11 genes for which I want to study their co-expression. Is there are any sequence analysis method to predict which set of genes are co-expressing. Thankyou in advance Regards, Nikhil NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India Great minds discuss ideas; Average minds discuss events; Small minds discuss people. --------------------------------- Yahoo! Photos ? Showcase holiday pictures in hardcover Photo Books. You design it and we?ll bind it! -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.squires at utsouthwestern.edu Wed Jan 18 17:10:25 2006 From: richard.squires at utsouthwestern.edu (Burke Squires) Date: Wed, 18 Jan 2006 16:10:25 -0600 Subject: [BiO BB] Phylogeny trees on many sequences? Message-ID: <6BD7B6E3-7080-4C0C-96C2-FA0D7C095985@utsouthwestern.edu> Hello, I am working on assembling phylogenic trees on multiple sets of large numbers of sequences, in some cases more then 3000 sequences. What is the best way to align these sequences and assemble a tree? I am interested in trees which calculate the lengths of the branches. I have dome some multiple sequence alignment using muscle which works in a reasonable amount of time. However, what options are available for creating trees from muscle output? Sincerely, Burke From forward at hongyu.org Thu Jan 19 20:15:39 2006 From: forward at hongyu.org (forward at hongyu.org) Date: Thu, 19 Jan 2006 17:15:39 -0800 (PST) Subject: [BiO BB] Re: Phylogeny trees on many sequences? In-Reply-To: <20060119170057.A3EE91C0AE@primary.bioinformatics.org> References: <20060119170057.A3EE91C0AE@primary.bioinformatics.org> Message-ID: <46534.67.17.255.178.1137719739.squirrel@hongyu.org> To generate a tree for too many sequences, at the first step, just like you, I use MUSCLE to generate the alignment because it's fast. Then, I have tried different ways to draw the tree. One is to use the programs in the Phylip package (or latest EMBOSS/Embassy package) to read the alignment and draw the phylogenetic tree. Alternatively, I also use ClustalW to load the alignment and generate the tree because it is easier to run than Phylip and Phylip has problems with gene names of more than 10 characters. If you generate the alignment using Muscle instead of ClustalW, you need to insert "CLUSTAL" to the beginning of the first line of the MUSCLE result when loading it into Clustalw to avoid a format error message. Another way that I've tried is to use Java applications such as Jalview to read the alignment and display the tree. From forward at hongyu.org Thu Jan 19 20:19:14 2006 From: forward at hongyu.org (forward at hongyu.org) Date: Thu, 19 Jan 2006 17:19:14 -0800 (PST) Subject: [BiO BB] Fasta file of KEGG pathways Message-ID: <47300.67.17.255.178.1137719954.squirrel@hongyu.org> I am wondering whether anyone has a downloable file containing all protein sequences of KEGG pathway (e.g., in FASTA format)? I remember seeing it somewhere on KEGG website, but I couldn't find it anymore. Basically, I want to BLAST my query sequences against the file to find out the pathway information of my sequences. I can't use the web interface of KEGG directly because I have too many sequences to query. It would be nice if I can download a local copy. Thanks! -- Hongyu Zhang, Ph.D. Computational Biologist Ceres Inc. 1535 Rancho Conejo Blvd Thousand Oaks, CA 91320 Phone: (805)376-6504 ext 1204 From aloraine at gmail.com Thu Jan 19 21:07:18 2006 From: aloraine at gmail.com (Ann Loraine) Date: Thu, 19 Jan 2006 20:07:18 -0600 Subject: [BiO BB] Fasta file of KEGG pathways In-Reply-To: <47300.67.17.255.178.1137719954.squirrel@hongyu.org> References: <47300.67.17.255.178.1137719954.squirrel@hongyu.org> Message-ID: <83722dde0601191807q6ddccce5l780474d96b20e88c@mail.gmail.com> Hi, This doesn't really answer your question..sorry! However, you might want to take a look at the Pathologic software from Peter Karp's group at SRI. It may have what you need. Yours, Ann Loraine On 1/19/06, forward at hongyu.org wrote: > I am wondering whether anyone has a downloable file containing all protein > sequences of KEGG pathway (e.g., in FASTA format)? I remember seeing it > somewhere on KEGG website, but I couldn't find it anymore. > > Basically, I want to BLAST my query sequences against the file to find > out the pathway information of my sequences. I can't use the web > interface of KEGG directly because I have too many sequences to query. It > would be nice if I can download a local copy. > > Thanks! > > -- > Hongyu Zhang, Ph.D. > Computational Biologist > Ceres Inc. > 1535 Rancho Conejo Blvd > Thousand Oaks, CA 91320 > Phone: (805)376-6504 ext 1204 > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From mkgovindis at yahoo.com Thu Jan 19 22:25:41 2006 From: mkgovindis at yahoo.com (govind mk) Date: Thu, 19 Jan 2006 19:25:41 -0800 (PST) Subject: [BiO BB] Fasta file of KEGG pathways In-Reply-To: <47300.67.17.255.178.1137719954.squirrel@hongyu.org> Message-ID: <20060120032541.54931.qmail@web34702.mail.mud.yahoo.com> Is this what you are looking for ftp://ftp.genome.jp/pub/kegg/tarfiles/ -govind forward at hongyu.org wrote: I am wondering whether anyone has a downloable file containing all protein sequences of KEGG pathway (e.g., in FASTA format)? I remember seeing it somewhere on KEGG website, but I couldn't find it anymore. Basically, I want to BLAST my query sequences against the file to find out the pathway information of my sequences. I can't use the web interface of KEGG directly because I have too many sequences to query. It would be nice if I can download a local copy. Thanks! -- Hongyu Zhang, Ph.D. Computational Biologist Ceres Inc. 1535 Rancho Conejo Blvd Thousand Oaks, CA 91320 Phone: (805)376-6504 ext 1204 _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Yahoo! Photos ? Showcase holiday pictures in hardcover Photo Books. You design it and we?ll bind it! -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.rensing at biologie.uni-freiburg.de Fri Jan 20 03:08:48 2006 From: stefan.rensing at biologie.uni-freiburg.de (Stefan Rensing) Date: Fri, 20 Jan 2006 09:08:48 +0100 Subject: [BiO BB] Phylogeny trees on many sequences? In-Reply-To: <6BD7B6E3-7080-4C0C-96C2-FA0D7C095985@utsouthwestern.edu> References: <6BD7B6E3-7080-4C0C-96C2-FA0D7C095985@utsouthwestern.edu> Message-ID: <43D09A90.4010304@biologie.uni-freiburg.de> Burke, you might want to think about using MAFFT (Katoh,K., Misawa,K., Kuma,K., and Miyata,T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acid Res., 30:3059-3066, http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/), which is well suited for large numbers of sequences. Concerning trees, you could use e.g. a combination of PHYLIP and TREE-PUZZLE, combining ML branch lengths and the speed of NJ trees, i.e. bootstrapping -> ML distance matrix calculation -> NJ trees -> ML consensus tree. Cheers, Stefan Burke Squires wrote: > Hello, > > I am working on assembling phylogenic trees on multiple sets of large > numbers of sequences, in some cases more then 3000 sequences. What is > the best way to align these sequences and assemble a tree? I am > interested in trees which calculate the lengths of the branches. > > I have dome some multiple sequence alignment using muscle which works > in a reasonable amount of time. However, what options are available for > creating trees from muscle output? > > Sincerely, > > Burke > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Dr. Stefan Rensing, Group Leader Computational Biology Plant Biotechnology, Faculty of Biology, University of Freiburg Schaenzlestr. 1, D-79104 Freiburg, Fon: +49 761 203-6974, Fax: -6945 http://www.plant-biotech.net/ http://www.cosmoss.org/ stefan.rensing at biologie.uni-freiburg.de "An old man dies. A young girl lives. A fair trade. I love you, Nancy." From christoph.gille at charite.de Fri Jan 20 05:23:47 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 20 Jan 2006 11:23:47 +0100 (CET) Subject: [BiO BB] Fasta file of KEGG pathways Message-ID: <64556.84.190.26.87.1137752627.squirrel@webmail.charite.de> The kegg reactions and the Swiss entries could be joined using the EC number. There is unfortunately a problem: The old Swissprot entries may lack the EC number. If you like I can send the swissprot - EC mapping as plain text. Further I have the KEGG in Java as a network of Pathway, Reaction, Enzyme and Compound Swissprot objects. With some program lines you coulkd iterate over the reactions and print the corresponding Swissprot entries. Cheers Christoph From idoerg at gmail.com Fri Jan 20 16:40:42 2006 From: idoerg at gmail.com (Iddo Friedberg) Date: Fri, 20 Jan 2006 13:40:42 -0800 Subject: [BiO BB] Call for Participation: The Second Automated Function Prediction Meeting August 30 -- Sep 1 2006 Message-ID: <43D158DA.9070902@burnham.org> Call for Participation: Talks, Papers and Posters The Second Automated Function Prediction Meeting August 30 -- Sep 1 2006, University of California San Diego http://biofunctionprediction.org/ Sequence and structure genomics have generated a wealth of data, but extracting meaningful information from genomic information is becoming an increasingly difficult challenge. Both the number and the diversity of discovered genes is increasing. This increase means that established annotation methods, such as homology transfer, are annotating less data. In addition, there is a need for annotation which is standardized so that it could be incorporated into function annotation on a large scale. Finally, there is a need to assess the quality of the function prediction software which is out there. We probably know the sequence of the target for next generation antibiotics or cancer treatment. We just don't realize that because the target is currently annotated as a "domain of unknown function". For these reasons and many more, automated protein function prediction is rapidly gaining interest among computational biologists in academia and industry. The first Automated Function Prediction (AFP) meeting was held alongside ISMB 2005, and gathered together some 100 attendees for a full day of talks, poster sessions, and a discussion panel. The second meeting will be a three day event, August 30-September 1st , 2006 at the campus of UCSD in San Diego, California. AFP 2006 will feature: * Plenary talks delivered by leading researchers in the field * Conference proceedings published as research papers in BMC Bioinformatics * A Special discussion panel on gene and protein annotation * A poster session Talks and posters are sought in, but not limited to, the following topics: *Function prediction using sequence based methods. This would include "classic" methods such as detection of functional motifs and inferring function from sequence similarity * Function from genomic information: prediction by genomic location; locus comparison with other organisms; function gain and loss. * Phylogeny based methods * Function from molecular interactions * Function from structure * Function prediction using combined methods * "Meta-talks" discussing the limitations and horizons of computational function prediction. * Assessing function prediction programs Authors of abstracts selected for talks would also have the opportunity to extend them to full length papers, which will be reviewed for publication in BMC Bioinformatics in a special AFP proceedings section. BMC Bioinformatics is an Open Access, peer-reviewed journal that considers articles on all aspects of computational methods used in the analysis and annotation of sequences and structures, as well as all other areas of computational biology. It has an ISI impact factor of 5.42 for year 2004. http://www.biomedcentral.com/bmcbioinformatics/ Dates: Deadline for submission of extended abstracts: April 24, 2006 Notification of acceptance: May 15, 2006 Deadline for poster abstract submission: May 30, 2006. Registration & fees: to be announced Please see the web site of the AFP 2006 meeting for further developments and announcements. http://biofunctionprediction.org The AFP2006 meeting is sponsored by: The California Institute for Telecommunications and Information Technology http://www.calit2.net/ The Burnham Institute for Medical Research http://www.burnham.org The Systematic Protein Annotation and Modeling Project http://spam.sdsc.edu If you are interested in sponsoring the AFP 2006 meeting, please contact Iddo Friedberg (idoerg at burnham.org) to learn about sponsorship opportunities. Iddo Friedberg, in the name of the AFP 2006 organizing committee -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://iddo-friedberg.org From idoerg at burnham.org Fri Jan 20 16:57:18 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Fri, 20 Jan 2006 13:57:18 -0800 Subject: [BiO BB] The Second Automated Function Prediction Meeting August 30 -- Sep 1 2006 Message-ID: <43D15CBE.2000106@burnham.org> Call for Participation: Talks, Papers and Posters The Second Automated Function Prediction Meeting August 30 -- Sep 1 2006, University of California San Diego http://biofunctionprediction.org/ Sequence and structure genomics have generated a wealth of data, but extracting meaningful information from genomic information is becoming an increasingly difficult challenge. Both the number and the diversity of discovered genes is increasing. This increase means that established annotation methods, such as homology transfer, are annotating less data. In addition, there is a need for annotation which is standardized so that it could be incorporated into function annotation on a large scale. Finally, there is a need to assess the quality of the function prediction software which is out there. We probably know the sequence of the target for next generation antibiotics or cancer treatment. We just don't realize that because the target is currently annotated as a "domain of unknown function". For these reasons and many more, automated protein function prediction is rapidly gaining interest among computational biologists in academia and industry. The first Automated Function Prediction (AFP) meeting was held alongside ISMB 2005, and brought together some 100 attendees for a full day of talks, poster sessions, and a discussion panel. The second meeting will be a three day event, August 30-September 1st , 2006 at the campus of UCSD in San Diego, California. AFP 2006 will feature: * Plenary talks delivered by leading researchers in the field * Conference proceedings published as research papers in BMC Bioinformatics * A Special discussion panel on gene and protein annotation * A poster session Talks and posters are sought in, but not limited to, the following topics: *Function prediction using sequence based methods. This would include "classic" methods such as detection of functional motifs and inferring function from sequence similarity * Function from genomic information: prediction by genomic location; locus comparison with other organisms; function gain and loss. * Phylogeny based methods * Function from molecular interactions * Function from structure * Function prediction using combined methods * "Meta-talks" discussing the limitations and horizons of computational function prediction. * Assessing function prediction programs Authors of abstracts selected for talks would also have the opportunity to extend them to full length papers, which will be reviewed for publication in BMC Bioinformatics in a special AFP proceedings section. BMC Bioinformatics is an Open Access, peer-reviewed journal that considers articles on all aspects of computational methods used in the analysis and annotation of sequences and structures, as well as all other areas of computational biology. It has an ISI impact factor of 5.42 for year 2004. http://www.biomedcentral.com/bmcbioinformatics/ Dates: Deadline for submission of extended abstracts: April 24, 2006 Notification of acceptance: May 15, 2006 Deadline for poster abstract submission: May 30, 2006. Registration & fees: to be announced Please see the web site of the AFP 2006 meeting for further developments and announcements. http://biofunctionprediction.org The AFP2006 meeting is sponsored by: The California Institute for Telecommunications and Information Technology http://www.calit2.net/ The Burnham Institute for Medical Research http://www.burnham.org The Systematic Protein Annotation and Modeling Project http://spam.sdsc.edu If you are interested in sponsoring the AFP 2006 meeting, please contact Iddo Friedberg (idoerg at burnham.org) to learn about sponsorship opportunities. Iddo Friedberg, in the name of the AFP 2006 organizing committee -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://iddo-friedberg.org From gopu_36 at yahoo.com Fri Jan 20 19:44:59 2006 From: gopu_36 at yahoo.com (lavi Birdie) Date: Fri, 20 Jan 2006 16:44:59 -0800 (PST) Subject: [BiO BB] Looking for a tool to do protein sequence clustering/ordering Message-ID: <20060121004459.41534.qmail@web31714.mail.mud.yahoo.com> Hi, I am looking for a tool to do protein clustering. My aim is, given a set of protein sequences, I want an output which is sorted according to their distance from each other. Basically, clustering as 1 whole cluster (not according to their families). The basic idea is by using pairwise alignment between all the sequences, sequences with minimum distance will be assigned first and then progressively aligning the sequences on either side depending upon the distance in the pairwise alignment score. This is basically using Taylor's method or some other hierarchical clustering strategy. So for a given set of input sequences, a matrix will be constructed initially with their pairwise alignment scores and then constructing sequences from minimum distance and adding the rest of the sequences progessively. If some one can suggest some tool to perform this sequence ordering, it would be more timely help for me. Thanks. Regards Lavi --------------------------------- Yahoo! Photos Ring in the New Year with Photo Calendars. Add photos, events, holidays, whatever. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eludens at mac.com Mon Jan 23 13:51:34 2006 From: eludens at mac.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_Vera?=) Date: Mon, 23 Jan 2006 19:51:34 +0100 Subject: [BiO BB] ProteinGlimpse 1.2 Message-ID: <1B9BF9A5-EEB1-4612-B371-CB9576B682E9@mac.com> Hi everybody, I am glad to introduce you ProteinGlimpse 1.2, a free widget (Mac OS X 10.4 o above) for visualizing macromolecules retrieved from the RCSB Protein Data Bank or any supported molecule file format dropped over it. This application is powered by Jmol. More information in the following link: < http://homepage.mac.com/eludens/html/ProteinGlimpse.html > Best wishes, Jaime Fern?ndez Vera From Reactome-Knowledgebase at reactome.org Mon Jan 23 13:40:08 2006 From: Reactome-Knowledgebase at reactome.org (Reactome-Knowledgebase at reactome.org) Date: Mon, 23 Jan 2006 13:40:08 -0500 Subject: [BiO BB] Reactome version 16 released Message-ID: The Reactome Knowledgebase group (Cold Spring Harbor Lab and the European Bioinformatics Institute) is proud to announce the release of Reactome Version 16 today, accessible at http://www.reactome.org! Reactome is a curated knowledgebase of biological processes in humans. It covers processes ranging from basic pathways of metabolism to complex events such as hormonal signaling and apoptosis. The information in Reactome is provided by expert bench biologists, and edited and managed as a relational database by the Reactome staff. New material is peer-reviewed and revised as necessary before publication to the web. Reactome entries are linked to corresponding ones in NCBI EntrezGene, OMIM, Ensembl genome annotations, UCSC Genome Browser, KEGG, ChEBI and Gene Ontology (GO). This release features the annotation of host-pathogen pathways in the form of Influenza A ( http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&ID=168254&) and HIV-1 (http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&ID=162906&) infections. The annotations are available as frameworks that show the entire viral life-cycles and molecular interactions between viral proteins and the host cellular machinery. The release also includes new modules covering the molecular details of the human electron transport chain and its integration with mitochondrial ATP synthesis and thermogenesis, and Drosophila melanogaster signaling pathway mediated by insulin-like proteins, and a substantially revised module covering the G2/M transition of the human mitotic cell cycle. The total number of curated human reactions in the knowledgebase is now 1451 and the total number of annotated human proteins is 1179. Reactome version 16 features several new software features to facilitate data analysis and downloading: - The SkyPainter tool ( http://www.reactome.org/cgi-bin/skypainter2?DB=gk_current), which enables users to visualize all Reactome processes "touched" by a user-specified list of proteins, has now been upgraded to return a list of all proteins and the names and identifiers of the specific processes in which they are involved; - Processes can be visualized in "Cytoscape" format; - Reactome processes can now be downloaded as PDF documents, enabling users to generate up-to-date, custom textbooks corresponding to part or all of the Knowledgebase; and - Reactome data can be exported in SMBL, Prot?g?, and BioPAX level 2 formats. Like everything in Reactome, these downloaded and exported materials can be reused and redistributed freely. Questions or comments? Please reply to this message. -from Reactome Teams at CSHL & EBI From bioinfosm at gmail.com Mon Jan 23 23:12:34 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Mon, 23 Jan 2006 23:12:34 -0500 Subject: [BiO BB] Fasta file of KEGG pathways In-Reply-To: <20060120032541.54931.qmail@web34702.mail.mud.yahoo.com> References: <47300.67.17.255.178.1137719954.squirrel@hongyu.org> <20060120032541.54931.qmail@web34702.mail.mud.yahoo.com> Message-ID: <726450810601232012g748eae3ag74a9abddfef5b05b@mail.gmail.com> Is there a good quick and dirty way to find the genes involved in a particular KEGG pathway .. for a particular organism, say Drosophila ? Cheers... On 1/19/06, govind mk wrote: > > Is this what you are looking for > > ftp://ftp.genome.jp/pub/kegg/tarfiles/ > > -govind > > > *forward at hongyu.org* wrote: > > I am wondering whether anyone has a downloable file containing all protein > sequences of KEGG pathway (e.g., in FASTA format)? I remember seeing it > somewhere on KEGG website, but I couldn't find it anymore. > > Basically, I want to BLAST my query sequences against the file to find > out the pathway information of my sequences. I can't use the web > interface of KEGG directly because I have too many sequences to query. It > would be nice if I can download a local copy. > > Thanks! > > -- > Hongyu Zhang, Ph.D. > Computational Biologist > Ceres Inc. > 1535 Rancho Conejo Blvd > Thousand Oaks, CA 91320 > Phone: (805)376-6504 ext 1204 > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > ------------------------------ > Yahoo! Photos ? Showcase holiday pictures in hardcover > Photo Books. > You design it and we'll bind it! > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthik at rishi.serc.iisc.ernet.in Tue Jan 24 22:53:16 2006 From: karthik at rishi.serc.iisc.ernet.in (Karthik Raman) Date: Wed, 25 Jan 2006 09:23:16 +0530 Subject: [BiO BB] Re: Genes Involved in KEGG Pathways Message-ID: Hi One way I can think of is to have a LinkDB db2db query from drosophila to reaction (or vice-versa). Check this out: http://www.genome.jp/dbget-bin/www_linkdbsub?dbkey=linkdb&mode=db2db&sort=to&keywords=dme&targetdb=rn&max_hit=nolimit This can be reached from the page: http://www.genome.jp/dbget-bin/www_linkdb Regards Karthik > Date: Mon, 23 Jan 2006 23:12:34 -0500 > From: Samantha Fox > Subject: Re: [BiO BB] Fasta file of KEGG pathways > To: "The general forum at Bioinformatics.Org" > > Message-ID: > <726450810601232012g748eae3ag74a9abddfef5b05b at mail.gmail.com> > Content-Type: text/plain; charset="windows-1252" > > Is there a good quick and dirty way to find the genes involved in a > particular KEGG pathway .. for a particular organism, say Drosophila ? > > Cheers... -- ************************************************************************** KARTHIK RAMAN Graduate Research Student Supercomputer Education and Research Centre/Bioinformatics Centre Indian Institute of Science Bangalore - 560 012 E-mail: karthik AT rishi.serc.iisc.ernet.in Webpage: http://openwetware.org/wiki/Karthik_Raman Blogspot: karthikraman.blogspot.com ************************************************************************** From mkgovindis at yahoo.com Tue Jan 24 22:36:07 2006 From: mkgovindis at yahoo.com (govind mk) Date: Tue, 24 Jan 2006 19:36:07 -0800 (PST) Subject: [BiO BB] Fasta file of KEGG pathways In-Reply-To: <726450810601232012g748eae3ag74a9abddfef5b05b@mail.gmail.com> Message-ID: <20060125033607.41532.qmail@web34707.mail.mud.yahoo.com> hi Follow this site ...you could use Kegg's wsdl and write small scripts that could generate the required output http://www.genome.jp/kegg/soap/ -govind Samantha Fox wrote: Is there a good quick and dirty way to find the genes involved in a particular KEGG pathway .. for a particular organism, say Drosophila ? Cheers... On 1/19/06, govind mk wrote: Is this what you are looking for ftp://ftp.genome.jp/pub/kegg/tarfiles/ -govind forward at hongyu.org wrote: I am wondering whether anyone has a downloable file containing all protein sequences of KEGG pathway ( e.g., in FASTA format)? I remember seeing it somewhere on KEGG website, but I couldn't find it anymore. Basically, I want to BLAST my query sequences against the file to find out the pathway information of my sequences. I can't use the web interface of KEGG directly because I have too many sequences to query. It would be nice if I can download a local copy. Thanks! -- Hongyu Zhang, Ph.D. Computational Biologist Ceres Inc. 1535 Rancho Conejo Blvd Thousand Oaks, CA 91320 Phone: (805)376-6504 ext 1204 _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Yahoo! Photos ? Showcase holiday pictures in hardcover Photo Books. You design it and we'll bind it! _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Do you Yahoo!? With a free 1 GB, there's more in store with Yahoo! Mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From akarger at CGR.Harvard.edu Fri Jan 27 11:51:19 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 27 Jan 2006 11:51:19 -0500 Subject: [BiO BB] formatdb now requires .formatdbrc? Message-ID: <339D68B133EAD311971E009027DC479704239946@montecarlo.cgr.harvard.edu> I'm seeing new behavior when I run formatdb: testportal:play/recip_blast/t2>formatdb -o T -i Hinf.faa -l stdout -p T -t blah ========================[ Jan 27, 2006 11:40 AM ]======================== Version 2.2.13 [Nov-27-2005] Started database file "Hinf.faa" NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed NOTE: CoreLib [002.003] FileOpen("/n/compbio/users/akarger/.formatdbrc","r") failed NOTE: [000.000] No number of link bits used found in config file. Ignoring NOTE: [000.000] No number of membership bits used found in config file. Ignoring Formatted 2 sequences in volume 0 I believe this started occurring when we upgraded from blast 2.2.10 to 2.2.13. Normally, I wouldn't complain, but I've got a script parsing the output from formatdb, which is now breaking. The docs to formatdb still say in the formatdbrc section that "These features are still under development and useful within NCBI only." Have they automatically put this feature into the binaries now, such that everyone needs a .formatdbrc? Thanks, - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From landman at scalableinformatics.com Fri Jan 27 12:05:19 2006 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 27 Jan 2006 12:05:19 -0500 Subject: [BiO BB] formatdb now requires .formatdbrc? In-Reply-To: <339D68B133EAD311971E009027DC479704239946@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC479704239946@montecarlo.cgr.harvard.edu> Message-ID: <43DA52CF.3070908@scalableinformatics.com> Hi Amir Amir Karger wrote: > I'm seeing new behavior when I run formatdb: > > testportal:play/recip_blast/t2>formatdb -o T -i Hinf.faa -l stdout -p T -t > blah > > ========================[ Jan 27, 2006 11:40 AM ]======================== > Version 2.2.13 [Nov-27-2005] > Started database file "Hinf.faa" > NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed > NOTE: CoreLib [002.003] FileOpen("/n/compbio/users/akarger/.formatdbrc","r") What happens if you do a touch ~/.formatdbrc for each user (as that user so the permissions are not munged) and touch .formatdbrc in your job script? Does this help eliminate those messages? Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From pculpep at hotmail.com Fri Jan 27 12:28:35 2006 From: pculpep at hotmail.com (Pamela Culpepper) Date: Fri, 27 Jan 2006 17:28:35 +0000 Subject: [BiO BB] formatdb now requires .formatdbrc? In-Reply-To: <339D68B133EAD311971E009027DC479704239946@montecarlo.cgr.harvard.edu> Message-ID: Amir, I have used formatdb and have not encountered any problems. However, I compiled formatdb from scratch using the NCBI Toolkit. Go to http://www.lifeformulae.com/local_blast_graphics/index.html and get the HOW_TO.txt and blastall_diff.txt files. They will enable you to compile the NCBI toolkit (with blastall, formatdb, etc.) for the Linux environment. This is the formatdb I used. As an aside, blastall will now product a local BLAST alignment gif for both the old and new BLAST engines. Pam >From: Amir Karger >Reply-To: "The general forum at Bioinformatics.Org" > >To: "'bio_bulletin_board at bioinformatics.org'" > >Subject: [BiO BB] formatdb now requires .formatdbrc? >Date: Fri, 27 Jan 2006 11:51:19 -0500 > >I'm seeing new behavior when I run formatdb: > >testportal:play/recip_blast/t2>formatdb -o T -i Hinf.faa -l stdout -p T -t >blah > >========================[ Jan 27, 2006 11:40 AM ]======================== >Version 2.2.13 [Nov-27-2005] >Started database file "Hinf.faa" >NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed >NOTE: CoreLib [002.003] >FileOpen("/n/compbio/users/akarger/.formatdbrc","r") >failed >NOTE: [000.000] No number of link bits used found in config file. Ignoring >NOTE: [000.000] No number of membership bits used found in config file. >Ignoring >Formatted 2 sequences in volume 0 > > >I believe this started occurring when we upgraded from blast 2.2.10 to >2.2.13. > >Normally, I wouldn't complain, but I've got a script parsing the output >from >formatdb, which is now breaking. The docs to formatdb still say in the >formatdbrc section that "These features are still under development and >useful within NCBI only." Have they automatically put this feature into the >binaries now, such that everyone needs a .formatdbrc? > >Thanks, > >- Amir Karger >Computational Biology Group >Bauer Center for Genomics Research >Harvard University >617-496-0626 >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From bioinfosm at gmail.com Fri Jan 27 14:53:42 2006 From: bioinfosm at gmail.com (Samantha Fox) Date: Fri, 27 Jan 2006 14:53:42 -0500 Subject: [BiO BB] Script to extract connected subgraphs Message-ID: <726450810601271153l4a9d8616o55b607774d00460e@mail.gmail.com> HI again !! I was looking for suggestions on how to find connected sub-graphs ? I have an undirected graph, and want to find connected sub-graph(s) and then ascertain the largest of them. Can someone suggest a script or modue that I can make use of ? Thanks ! ~S -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahmed.essaghir at gmail.com Thu Jan 26 04:29:09 2006 From: ahmed.essaghir at gmail.com (ahmed essaghir) Date: Thu, 26 Jan 2006 10:29:09 +0100 Subject: [BiO BB] gene involved in Kegg pathway Message-ID: <6ec4c2000601260129u67ef654au69f49b91beb86d80@mail.gmail.com> Hi! this a simple perl script using soap for getting and coloring genes involved in a kegg path : you specify the gene you want to see by giving it's geneID (genbank id see the example "$H" ). here i give an example for the "cbl" gene for human and there is also a GI for "jun" gene in mouse if you want to try it. the kegg identifier for genes is given by the sepcies code (hsa for homo sapiens) + ":" + GI (867) : e.g "hsa:867" for cbl. you can use this script in your cgi-bin by passing to it from a form the needed parameter $H or $M and of course you can modify it in order to be adapted to the species you want. as a result you will get links to two kinds of images (clickable and not). to run this code you should install all needed perl modules (CPAN : http://www.cpan.org/ ) and of course the SOAP module. #! /usr/bin/env perl use SOAP::Lite; use CGI qw(:standard); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); print "Content-type: text/html\r\n\r\n"; print '', "\n"; print "Kegg test\n"; print "\n"; #GI number H for human and M for mouse #cbl gene my $H = 867; #jun gene (try $M = 16476 and $H = "") my $M = ""; #print "$H : Human
"; $wsdl = 'http://soap.genome.jp/KEGG.wsdl'; #set the soap service $serv = SOAP::Lite -> service($wsdl); $wsdl = 'http://soap.genome.jp/KEGG.wsdl'; #get the definition of the gene from kegg if ($H ) { print "$H : Human
"; $keggID = $serv -> btit('"'."hsa:".$H.'"'); } elsif ($M) { print "$M : Mouse
"; $keggID = $serv -> btit('"'."mmu:".$M.'"'); } @def = split(/ /,$keggID); #kegg id is now in $def[0] $genes = SOAP::Data->type(array => [$def[0]]); #set colors for the kegg graph $fg_list = SOAP::Data->type(array => ['gray', '#00ff00', 'blue']); $bg_list = SOAP::Data->type(array => ['#ff0000', 'yellow', 'orange']); #get pathways by genes $result = $serv -> get_pathways_by_genes($genes); foreach $hit (@{$result}) { print "$hit :
"; #get web link to marked static pathways $colored = $serv -> mark_pathway_by_objects($hit, $genes); #get web link to clickable kegg pathways $link = $serv -> get_html_of_colored_pathway_by_objects($hit,$genes,$fg_list,$bg_list); #print links if ($colored or $link) { print ""."".$colored."" ."
"; print ""."".$link.""."
"; } }; print "Done!
"; print "\n"; -------------- next part -------------- An HTML attachment was scrubbed... URL: From sourangshu at csa.iisc.ernet.in Sat Jan 28 11:20:00 2006 From: sourangshu at csa.iisc.ernet.in (Sourangshu Bhattacharya) Date: Sat, 28 Jan 2006 21:50:00 +0530 (IST) Subject: [BiO BB] Script to extract connected subgraphs In-Reply-To: <726450810601271153l4a9d8616o55b607774d00460e@mail.gmail.com> References: <726450810601271153l4a9d8616o55b607774d00460e@mail.gmail.com> Message-ID: Hi, Just run Breadth First Search (BFS) or Depth First Search (DFS) from starting from a random vertex and keep marking the vertices you visit. when you have got to all the vertices reachable from any of the nodes (the BFS or DFS ends), start the same with a vertex not visited yet. At the end of each BFS or DFS you get one connected component. when all the vertices are visited, you get all the components !! Sourangshu Sourangshu Bhattacharya PhD Student, Dept. of Computer Science & Automation, IISc, Bangalore. http://people.csa.iisc.ernet.in/sourangshu On Fri, 27 Jan 2006, Samantha Fox wrote: > HI again !! > > I was looking for suggestions on how to find connected sub-graphs ? > I have an undirected graph, and want to find connected sub-graph(s) and then > ascertain the largest of them. > > Can someone suggest a script or modue that I can make use of ? > > Thanks ! > ~S > From Nadia.Bolshakova at cs.tcd.ie Mon Jan 30 06:01:02 2006 From: Nadia.Bolshakova at cs.tcd.ie (Nadia Bolshakova) Date: Mon, 30 Jan 2006 11:01:02 -0000 Subject: [BiO BB] Submission deadline extension for Special Track "BIOINFORMATICS and its MEDICAL APPLICATIONS" papers, CBMS 2006 Message-ID: <00b001c6258c$759c3b20$6e26e286@DBNJK90J> SUBMISSION DEADLINE EXTENSION for the Special Track "BIOINFORMATICS and its MEDICAL APPLICATIONS" papers, CBMS 2006 New deadline for Submission of (6-page, maximum) papers: February 9, 2006 ************************************************************************************* Special Track: BIOINFORMATICS and its MEDICAL APPLICATIONS 19th IEEE International Symposium on Computer-Based Medical Systems, CBMS 2006 Salt Lake City Utah, USA June 22-23, 2006 CALL FOR PAPERS may be found here: https://www.cs.tcd.ie/Nadia.Bolshakova/CBMS_Bioinformatics06.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjaart at tuks.co.za Mon Jan 30 07:54:36 2006 From: tjaart at tuks.co.za (Tjaart de Beer) Date: Mon, 30 Jan 2006 14:54:36 +0200 Subject: [BiO BB] Java and InterProSan Message-ID: <43DE0C8C.8070306@tuks.co.za> Hi Does anyone know of a Java implementation of InterProScan? The EBI unfortunately only provides a Perl package... OR alternatively does anyone know of Java packages that can perform the seperate scans done in InterProScan? I have had a look at Biojava but there seems to be limited functionality available. Thanks! -- Tjaart de Beer From akarger at CGR.Harvard.edu Mon Jan 30 16:13:48 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 30 Jan 2006 16:13:48 -0500 Subject: [BiO BB] FW: [blast-help] formatdbrc is now required? Message-ID: <339D68B133EAD311971E009027DC479704239A62@montecarlo.cgr.harvard.edu> Here's the email I got from NCBI. I didn't hear quite as much "Sorry for breaking your script without changing the formatdb docs." as I would like, but it is free software after all. (It falls under the "paid for by my tax dollars" definition of free.) - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University -----Original Message----- From: Matten, Wayne (NIH/NLM) [C] [mailto:matten at ncbi.nlm.nih.gov] Sent: Monday, January 30, 2006 3:54 PM To: Amir Karger; NLM/NCBI List blast-help Subject: RE: [blast-help] formatdbrc is now required? Hello, You can ignore those messages. No .formatdbrc file is needed. Best regards, Wayne >>><<><>>>>>>>><>>> Wayne Matten, PhD NCBI Service Desk > -----Original Message----- > From: Amir Karger [mailto:akarger at CGR.Harvard.edu] > Sent: Monday, January 30, 2006 9:53 AM > To: NLM/NCBI List blast-help > Subject: [blast-help] formatdbrc is now required? > > I'm getting warning messages from the new formatdb. > > I used a file Hinf.faa with two protein sequences in it, and did: > > testportal>formatdb -o T -p T -i Hinf.faa more formatdb.log > > ========================[ Jan 30, 2006 9:47 AM > ]======================== Version 2.2.13 [Nov-27-2005] > Started database file "Hinf.faa" > NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed > NOTE: CoreLib [002.003] > FileOpen("/n/compbio/users/akarger/.formatdbrc","r") > fai > led > NOTE: [000.000] No number of link bits used found in config > file. Ignoring > NOTE: [000.000] No number of membership bits used found in > config file. > Ignoring > Formatted 2 sequences in volume 0 > > ------------------- > As you can see, I'm getting warning messages for not having a > .formatdbrc. > Unfortunately, this is breaking a script I have which runs > formatdb every night and looks for errors in the log. Of > course, I can change it to skip these lines, but I'm still curious. > > The formatdb docs say "These features are still under > development and useful within NCBI only." Do I (and everyone > in my building who uses formatdb) need to create a > formatdbrc? What should I put in there? > > Thanks, > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > 617-496-0626 > > ps I'm using the blast-2.2.13-ia32-linux.tar: > > The executable files were built on Tue Dec 6 10:23:48 EST > 2005 The version number of each individual application may be > found in the appropriate documentation files in ./ncbi/doc/ > uname -a ouput is: Linux icoremake0 2.6.11.10-III-64G #2 SMP > Wed May 25 > 10:52:09 > EDT 2005 i686 unknown >