From akarger at CGR.Harvard.edu Mon Oct 3 16:17:19 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 3 Oct 2005 16:17:19 -0400 Subject: [BiO BB] Linear Bioinformatics workflow? Message-ID: <339D68B133EAD311971E009027DC47970354C8A7@montecarlo.cgr.harvard.edu> Several people mentioned 2-D graphical workflow tool in a "Bioinformatics workflow?" thread on bioclusters. (I'm redirecting my non-cluster-y question here.) While still a newbie, I'm getting the impression that many bioinformatics workflows are mostly linear, with obvious important exceptions like conditions and loops. For example, I had a client last week who wanted to script: 1 blast [sequence=..., program=...] > blast.out 2 get hits from blast.out > blast.hits 3 find hits with 50-70% sequence identity from blast.hits > blast.good_hits 3 download/fastacmd sequences for IDs in blast.good_hits > hits.fasta 4 clustalw hits.fasta > publishable_result (OK, not really) Our current model is to help people to write shell scripts, but that doesn't work for all users. It seems like a two-dimensional workflow tool would be overkill for the above. All I need is a tool that combines Pise/iNquiry-style "select a bioinformatics tool, input parameters" with the ability to save a set of commands. Of course, it would be much less powerful and flexible than the 2-D workflow tools. But "protocols" (http://biopipe.org/protocols/) might be an easier sell to computer-phobes than directed acyclic graphs. Is there anything out there that does this? I'd much rather steal than build. - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University From william.hsiao at gmail.com Mon Oct 3 16:28:22 2005 From: william.hsiao at gmail.com (William Hsiao) Date: Mon, 3 Oct 2005 13:28:22 -0700 Subject: [BiO BB] Linear Bioinformatics workflow? In-Reply-To: <339D68B133EAD311971E009027DC47970354C8A7@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970354C8A7@montecarlo.cgr.harvard.edu> Message-ID: <679a35b20510031328k6612584bndc6b448b2bad2f0b@mail.gmail.com> Hi Amir, What about Pegasys (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15096276&query_hl=12) or Taverna (http://taverna.sourceforge.net/) as possible systems? These are graphical based system as well but they would be able to do what you described below. Cheers, Will On 10/3/05, Amir Karger wrote: > Several people mentioned 2-D graphical workflow tool in a "Bioinformatics > workflow?" thread on bioclusters. (I'm redirecting my non-cluster-y question > here.) While still a newbie, I'm getting the impression that many > bioinformatics workflows are mostly linear, with obvious important > exceptions like conditions and loops. For example, I had a client last week > who wanted to script: > > 1 blast [sequence=..., program=...] > blast.out > 2 get hits from blast.out > blast.hits > 3 find hits with 50-70% sequence identity from blast.hits > blast.good_hits > 3 download/fastacmd sequences for IDs in blast.good_hits > hits.fasta > 4 clustalw hits.fasta > publishable_result (OK, not really) > > Our current model is to help people to write shell scripts, but that doesn't > work for all users. It seems like a two-dimensional workflow tool would be > overkill for the above. All I need is a tool that combines > Pise/iNquiry-style "select a bioinformatics tool, input parameters" with the > ability to save a set of commands. > > Of course, it would be much less powerful and flexible than the 2-D workflow > tools. But "protocols" (http://biopipe.org/protocols/) might be an easier > sell to computer-phobes than directed acyclic graphs. > > Is there anything out there that does this? I'd much rather steal than > build. > > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- William Hsiao PhD Student, Brinkman Laboratory Department of Molecular Biology and Biochemistry Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6 Phone: 604-291-4206 Fax: 604-291-5583 From marty.gollery at gmail.com Mon Oct 3 17:37:48 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Mon, 3 Oct 2005 14:37:48 -0700 Subject: [BiO BB] Linear Bioinformatics workflow? In-Reply-To: <339D68B133EAD311971E009027DC47970354C8A7@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970354C8A7@montecarlo.cgr.harvard.edu> Message-ID: Along with the aforementioned Taverna and pegasys, take a look at VIBE from Incogen. Since you are at a university you can get the API for free. Marty On 10/3/05, Amir Karger wrote: > > Several people mentioned 2-D graphical workflow tool in a "Bioinformatics > workflow?" thread on bioclusters. (I'm redirecting my non-cluster-y > question > here.) While still a newbie, I'm getting the impression that many > bioinformatics workflows are mostly linear, with obvious important > exceptions like conditions and loops. For example, I had a client last > week > who wanted to script: > > 1 blast [sequence=..., program=...] > blast.out > 2 get hits from blast.out > blast.hits > 3 find hits with 50-70% sequence identity from blast.hits > > blast.good_hits > 3 download/fastacmd sequences for IDs in blast.good_hits > hits.fasta > 4 clustalw hits.fasta > publishable_result (OK, not really) > > Our current model is to help people to write shell scripts, but that > doesn't > work for all users. It seems like a two-dimensional workflow tool would be > overkill for the above. All I need is a tool that combines > Pise/iNquiry-style "select a bioinformatics tool, input parameters" with > the > ability to save a set of commands. > > Of course, it would be much less powerful and flexible than the 2-D > workflow > tools. But "protocols" (http://biopipe.org/protocols/) might be an easier > sell to computer-phobes than directed acyclic graphs. > > Is there anything out there that does this? I'd much rather steal than > build. > > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdwan at bioteam.net Tue Oct 4 10:40:48 2005 From: cdwan at bioteam.net (Chris Dwan) Date: Tue, 4 Oct 2005 10:40:48 -0400 Subject: [BiO BB] Linear Bioinformatics workflow? In-Reply-To: <339D68B133EAD311971E009027DC47970354C8A7@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC47970354C8A7@montecarlo.cgr.harvard.edu> Message-ID: <1A053154-51AC-4DB4-862D-6A790B76A42F@bioteam.net> Amir, There is another major category of workflow that I've seen: "I have a [protein sequence | genomic region | compound | mass spec output] and I want to find out everything in the entire world about it. Ideally, all the result values would be converted to a common vocabulary, format, and normalization. I would rather not go to every website in the world, or even know about all those websites. Can you help me?" This is a "wide" rather than a "deep" process. As to your original question - my personal opinion is that interface design is really, really hard, and that if someone were going to come up with a good, generic way to put that sort of power in the hands of non-programmer types, it would have happened by now. That said, if you narrow the problem enough that it doesn't have to do everything in the world, things get a lot simpler. Each of the tools that people have mentioned have their strengths and weaknesses. None will solve every problem. I'm not aware of a really killer solution for your specific use case: - let users explore a limited set of tools, and dynamically build up a protocol - save that protocol in a personal workspace for future (personal) re-use and possible sharing - but keep it totally limited and simple so as not to intimidate non-programmers - Plus flexible enough to handle large-ish batches of data Most of the commercial and free workflow engines will do this, but it sounds like the overhead of learning to use them is a bit much for your users? -Chris Dwan Amir Karger wrote: > Several people mentioned 2-D graphical workflow tool in a > "Bioinformatics > workflow?" thread on bioclusters. (I'm redirecting my non-cluster-y > question > here.) While still a newbie, I'm getting the impression that many > bioinformatics workflows are mostly linear, with obvious important > exceptions like conditions and loops. For example, I had a client > last week > who wanted to script: > > 1 blast [sequence=..., program=...] > blast.out > 2 get hits from blast.out > blast.hits > 3 find hits with 50-70% sequence identity from blast.hits > > blast.good_hits > 3 download/fastacmd sequences for IDs in blast.good_hits > hits.fasta > 4 clustalw hits.fasta > publishable_result (OK, not really) From smagarwal at yahoo.com Wed Oct 5 00:37:56 2005 From: smagarwal at yahoo.com (Subhash Agarwal) Date: Wed, 5 Oct 2005 05:37:56 +0100 (BST) Subject: [BiO BB] Torsion angles Message-ID: <20051005043756.36326.qmail@web31510.mail.mud.yahoo.com> Hi everybody I would like to know that what are the defined torsion angles for main chain C (i.e Ca) and side chain C (CB). The side chain can be any of the 20 amino acids. Does something like this exists? I mean something similar to Ramachandran plot which is for the backbone. Thanks Subhash Agarwal __________________________________________________________ Yahoo! India Matrimony: Find your partner now. Go to http://yahoo.shaadi.com From boris.steipe at utoronto.ca Wed Oct 5 08:03:24 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Wed, 5 Oct 2005 08:03:24 -0400 Subject: [BiO BB] Torsion angles In-Reply-To: <20051005043756.36326.qmail@web31510.mail.mud.yahoo.com> References: <20051005043756.36326.qmail@web31510.mail.mud.yahoo.com> Message-ID: Look at http://dunbrack.fccc.edu/bbdep/ for everything on chi-angle rotamers. Excellent site. B. On 5 Oct 2005, at 00:37, Subhash Agarwal wrote: > Hi everybody > > I would like to know that what are the defined torsion > angles for main chain C (i.e Ca) and side chain C > (CB). The side chain can be any of the 20 amino acids. > Does something like this exists? > > I mean something similar to Ramachandran plot which is > for the backbone. > > Thanks > > Subhash Agarwal > > > > __________________________________________________________ > Yahoo! India Matrimony: Find your partner now. Go to http:// > yahoo.shaadi.com > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From dwivedbz at notes.udayton.edu Wed Oct 5 10:11:00 2005 From: dwivedbz at notes.udayton.edu (dwivedbz at notes.udayton.edu) Date: Wed, 5 Oct 2005 10:11:00 -0400 Subject: [BiO BB] In search of complete conserved genes.... Message-ID: An HTML attachment was scrubbed... URL: From idoerg at burnham.org Wed Oct 5 12:30:43 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 05 Oct 2005 09:30:43 -0700 Subject: [BiO BB] In search of complete conserved genes.... In-Reply-To: References: Message-ID: <4343FFB3.8030506@burnham.org> Look for a recent paper by Gill Bejerano, Jim Kent and David Haussler in Nature Methods. I think you'll find what you need in there. dwivedbz at notes.udayton.edu wrote: > Hello everyone! > > I am looking for complete conserved protein-coding genes that are > widely distributed among bacterial species (should be present in > atleast 6-7 bacterial species). Also, I need such genes to show high > degree of sequence similarities in the species they exist. I would > appreciate if you could help me out. > > Thanks! > > Bhakti > >------------------------------------------------------------------------ > >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9930 http://ffas.ljcrf.edu/~iddo From sariego9 at yahoo.com Wed Oct 5 12:50:50 2005 From: sariego9 at yahoo.com (Diego Martinez) Date: Wed, 5 Oct 2005 09:50:50 -0700 (PDT) Subject: [BiO BB] In search of complete conserved genes.... In-Reply-To: <4343FFB3.8030506@burnham.org> Message-ID: <20051005165050.10060.qmail@web32514.mail.mud.yahoo.com> Or there is the swissprot HAMAP tool that classifies families in a kinda "phylogenetic profile" ala Eisenberg... http://us.expasy.org/sprot/hamap/ Diego --- Iddo Friedberg wrote: > Look for a recent paper by Gill Bejerano, Jim Kent and David Haussler in > Nature Methods. I think you'll find what you need in there. > > > dwivedbz at notes.udayton.edu wrote: > > > Hello everyone! > > > > I am looking for complete conserved protein-coding genes that are > > widely distributed among bacterial species (should be present in > > atleast 6-7 bacterial species). Also, I need such genes to show high > > degree of sequence similarities in the species they exist. I would > > appreciate if you could help me out. > > > > Thanks! > > > > Bhakti > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > -- > > Iddo Friedberg, Ph.D. > Burnham Institute for Medical Research > 10901 N. Torrey Pines Rd. > La Jolla, CA 92037 > Tel: (858) 646 3100 x3516 > Fax: (858) 713 9930 > http://ffas.ljcrf.edu/~iddo > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .=$=. .=$=. .=$=. .=$=. .=$=. .=$=. @ @ | | | @ | | | @ @ | | | @ | | | @ @ | | | @ | | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | | @ | | | @ @ | | | @ | | | @ @ | | | @ | | | @ @ | ~' `~$~' `~$~' `~$~' `~$~' `~$~' `~ __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From boris.steipe at utoronto.ca Wed Oct 5 12:57:06 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Wed, 5 Oct 2005 12:57:06 -0400 Subject: [BiO BB] In search of complete conserved genes.... In-Reply-To: References: Message-ID: This is what COGS was built for: http://www.ncbi.nlm.nih.gov/COG/ "Interesting" interface though. Probably the list you might want to work with is http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi Boris dwivedbz at notes.udayton.edu wrote: > Hello everyone! > I am looking for complete conserved protein-coding genes that are > widely distributed among bacterial species (should be present in > atleast 6-7 bacterial species). Also, I need such genes to show > high degree of sequence similarities in the species they exist. I > would appreciate if you could help me out. Thanks! > Bhakti > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > From P.Curley at westminster.ac.uk Thu Oct 6 13:46:43 2005 From: P.Curley at westminster.ac.uk (paul) Date: Thu, 6 Oct 2005 10:46:43 -0700 Subject: [BiO BB] Most common protein fold? In-Reply-To: Message-ID: Hi Folks, Quick question. Does anyone know by any chance know how I can find the number of individual proteins within each superfamily and family of the SCOP database to get an idea of which folds are the most common and which are very rare? Any help much appreciated. Best Regards, Paul -----Original Message----- From: bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.org [mailto:bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.or g]On Behalf Of Boris Steipe Sent: Wednesday, October 05, 2005 9:57 AM To: The general forum at Bioinformatics.Org Subject: Re: [BiO BB] In search of complete conserved genes.... This is what COGS was built for: http://www.ncbi.nlm.nih.gov/COG/ "Interesting" interface though. Probably the list you might want to work with is http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi Boris dwivedbz at notes.udayton.edu wrote: > Hello everyone! > I am looking for complete conserved protein-coding genes that are > widely distributed among bacterial species (should be present in > atleast 6-7 bacterial species). Also, I need such genes to show > high degree of sequence similarities in the species they exist. I > would appreciate if you could help me out. Thanks! > Bhakti > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From boris.steipe at utoronto.ca Thu Oct 6 08:48:10 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 6 Oct 2005 08:48:10 -0400 Subject: [BiO BB] Most common protein fold? In-Reply-To: References: Message-ID: <8FFE2B72-ED62-4C66-BC26-7D527FCFBED4@utoronto.ca> The SCOP help-file at http://scop.mrc-lmb.cam.ac.uk/scop/help.html Has the following to say: "The number in parenthesis after an entry shows how many children will be found there." So for example the TIM b/a barrel Fold ----- TIM beta/alpha-barrel [51350] (31) has 31 superfamilies and its Ribulose-phosphate binding barrel ---------- Ribulose-phoshate binding barrel [51366] (4) has 4 families. Hope this is what you were looking for Boris ========================================== On 6 Oct 2005, at 13:46, paul wrote: > Hi Folks, > > Quick question. Does anyone know by any chance know how I can find the > number of individual proteins within > each superfamily and family of the SCOP database to get an idea of > which > folds are the most > common and which are very rare? > > Any help much appreciated. > > Best Regards, > > Paul > > -----Original Message----- > From: bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.org > [mailto:bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.or > g]On Behalf Of Boris Steipe > Sent: Wednesday, October 05, 2005 9:57 AM > To: The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] In search of complete conserved genes.... > > > This is what COGS was built for: > > http://www.ncbi.nlm.nih.gov/COG/ > > "Interesting" interface though. Probably the list you might want to > work with is > http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi > > > Boris > > > dwivedbz at notes.udayton.edu wrote: > > > >> Hello everyone! >> I am looking for complete conserved protein-coding genes that are >> widely distributed among bacterial species (should be present in >> atleast 6-7 bacterial species). Also, I need such genes to show >> high degree of sequence similarities in the species they exist. I >> would appreciate if you could help me out. Thanks! >> Bhakti >> >> --------------------------------------------------------------------- >> - >> -- >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From nikbakht at ibb.ut.ac.ir Thu Oct 6 09:09:31 2005 From: nikbakht at ibb.ut.ac.ir (Hamid Nikbakht) Date: Thu, 06 Oct 2005 16:39:31 +0330 Subject: [BiO BB] Most common protein fold? In-Reply-To: References: Message-ID: Dear Paul, You are so lucky! Go there: http://scop.mrc-lmb.cam.ac.uk/scop/parse/index.html you can find some parsable files taht you can find tou resuls here: dir.cla.scop.txt 1.69 1.67 1.65 1.63 1.61 1.59 1.57 1.55 If you chose the last version (1.69) of SCOP flat file you can find a page that have a list of protein with their scop (SCCS) code. each code represents the common fold, super family, family... of each protein. for instance if you wanna have information about a protein with 1isc PDB code you will see this line: d1iscb1 1isc B:1-82 a.2.11.1 ...... You can simply write a 2-3 lines code that parse this file. then you can find not only waht you want but also everything other about protein folds. I'm sure you can B'Caz I could do it before ;) Yours truly, Behnam /* Hamid( Behnam )Nikbakht, M.Sc of Cell and Molecular Sciences Bioinformatics Center Laboratory of Biophysics and Molecular Biology Institute of Biochemistry and Biophysics University of Tehran P.O.Box 13145-1384 Tehran, Iran. Tel: (+98 21) 6498672 Fax: (+98 21) 6956985 Alt. E-Mail : hamid at ibb.ut.ac.ir */ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmb at mrc-dunn.cam.ac.uk Thu Oct 6 10:35:06 2005 From: dmb at mrc-dunn.cam.ac.uk (dmb at mrc-dunn.cam.ac.uk) Date: Thu, 6 Oct 2005 15:35:06 +0100 (BST) Subject: [BiO BB] Most common protein fold? In-Reply-To: References: Message-ID: <33427.213.107.105.179.1128609306.squirrel@www.mrc-dunn.cam.ac.uk> > Hi Folks, > > Quick question. Does anyone know by any chance know how I can find the > number of individual proteins within > each superfamily and family of the SCOP database to get an idea of which > folds are the most > common and which are very rare? > > Any help much appreciated. Use the SUPERFAMILY database, http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ That gives you exactly what you need broken down by organism :) > Best Regards, > > Paul > > -----Original Message----- > From: bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.org > [mailto:bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.or > g]On Behalf Of Boris Steipe > Sent: Wednesday, October 05, 2005 9:57 AM > To: The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] In search of complete conserved genes.... > > > This is what COGS was built for: > > http://www.ncbi.nlm.nih.gov/COG/ > > "Interesting" interface though. Probably the list you might want to > work with is > http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi > > > Boris > > > dwivedbz at notes.udayton.edu wrote: > > >> Hello everyone! >> I am looking for complete conserved protein-coding genes that are >> widely distributed among bacterial species (should be present in >> atleast 6-7 bacterial species). Also, I need such genes to show >> high degree of sequence similarities in the species they exist. I >> would appreciate if you could help me out. Thanks! >> Bhakti >> >> ---------------------------------------------------------------------- >> -- >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From P.Curley at westminster.ac.uk Thu Oct 6 19:03:23 2005 From: P.Curley at westminster.ac.uk (paul) Date: Thu, 6 Oct 2005 16:03:23 -0700 Subject: [BiO BB] Most common protein fold? In-Reply-To: <8FFE2B72-ED62-4C66-BC26-7D527FCFBED4@utoronto.ca> Message-ID: Hi Boris, Thanks you for this, but I was really wondering how many individual known proteins fall within each class and subclass (i.e. superfamilily, family, etc.). For example, how many individual proteins adopt the Ribulose-phoshate binding barrel fold and how are the proteins distributed between the four families? In other words, I am not trying to find out how many different clases and subclasses of protein folds we have, but rather how are known proteins (e.g. those in Swiss-Prot or PDB for example) distributed amoungst the various folds? Hope this makes sense?! Best Regards, Paul -----Original Message----- From: bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.org [mailto:bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.or g]On Behalf Of Boris Steipe Sent: Thursday, October 06, 2005 5:48 AM To: The general forum at Bioinformatics.Org Subject: Re: [BiO BB] Most common protein fold? The SCOP help-file at http://scop.mrc-lmb.cam.ac.uk/scop/help.html Has the following to say: "The number in parenthesis after an entry shows how many children will be found there." So for example the TIM b/a barrel Fold ----- TIM beta/alpha-barrel [51350] (31) has 31 superfamilies and its Ribulose-phosphate binding barrel ---------- Ribulose-phoshate binding barrel [51366] (4) has 4 families. Hope this is what you were looking for Boris ========================================== On 6 Oct 2005, at 13:46, paul wrote: > Hi Folks, > > Quick question. Does anyone know by any chance know how I can find the > number of individual proteins within > each superfamily and family of the SCOP database to get an idea of > which > folds are the most > common and which are very rare? > > Any help much appreciated. > > Best Regards, > > Paul > > -----Original Message----- > From: bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.org > [mailto:bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.or > g]On Behalf Of Boris Steipe > Sent: Wednesday, October 05, 2005 9:57 AM > To: The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] In search of complete conserved genes.... > > > This is what COGS was built for: > > http://www.ncbi.nlm.nih.gov/COG/ > > "Interesting" interface though. Probably the list you might want to > work with is > http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi > > > Boris > > > dwivedbz at notes.udayton.edu wrote: > > > >> Hello everyone! >> I am looking for complete conserved protein-coding genes that are >> widely distributed among bacterial species (should be present in >> atleast 6-7 bacterial species). Also, I need such genes to show >> high degree of sequence similarities in the species they exist. I >> would appreciate if you could help me out. Thanks! >> Bhakti >> >> --------------------------------------------------------------------- >> - >> -- >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From boris.steipe at utoronto.ca Thu Oct 6 12:24:21 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 6 Oct 2005 12:24:21 -0400 Subject: [BiO BB] Most common protein fold? In-Reply-To: References: Message-ID: <93B97926-F819-43C9-ADF2-6517E7320CCB@utoronto.ca> Hi Paul, In my mind it would only make sense per organism, because the number of "known" proteins is an arbitrary subset. Wouldn't you agree? Then again, I am not even sure the number of known genes in an organism is all that meaningful either, because they are expressed at hugely different levels ... is the mere presence of a gene enough to support the type of inference you are looking for? For folds-per-organism it seems the SUPERFAMILY database referred to by "dmb at mrc-dunn.cam.ac.uk" is indeed your best bet. Does this help? Boris On 6 Oct 2005, at 19:03, paul wrote: > Hi Boris, > > Thanks you for this, but I was really wondering how many individual > known > proteins fall within each class and subclass (i.e. superfamilily, > family, > etc.). For example, how many individual proteins adopt the Ribulose- > phoshate > binding barrel fold and how are the proteins distributed between > the four > families? In other words, I am not trying to find out how many > different > clases and subclasses of protein folds we have, but rather how are > known > proteins (e.g. those in Swiss-Prot or PDB for example) distributed > amoungst > the various folds? Hope this makes sense?! > > Best Regards, > > Paul > > -----Original Message----- > From: bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.org > [mailto:bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.or > g]On Behalf Of Boris Steipe > Sent: Thursday, October 06, 2005 5:48 AM > To: The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] Most common protein fold? > > > The SCOP help-file at http://scop.mrc-lmb.cam.ac.uk/scop/help.html > Has the following to say: > > "The number in parenthesis after an entry shows how many children > will be found there." > > So for example the TIM b/a barrel Fold > ----- TIM beta/alpha-barrel [51350] (31) > has 31 superfamilies and its Ribulose-phosphate binding barrel > ---------- Ribulose-phoshate binding barrel [51366] (4) > has 4 families. > > Hope this is what you were looking for > > Boris > ========================================== > > On 6 Oct 2005, at 13:46, paul wrote: > > >> Hi Folks, >> >> Quick question. Does anyone know by any chance know how I can find >> the >> number of individual proteins within >> each superfamily and family of the SCOP database to get an idea of >> which >> folds are the most >> common and which are very rare? >> >> Any help much appreciated. >> >> Best Regards, >> >> Paul >> >> -----Original Message----- >> From: bio_bulletin_board-bounces >> +p.curley=wmin.ac.uk at bioinformatics.org >> [mailto:bio_bulletin_board-bounces >> +p.curley=wmin.ac.uk at bioinformatics.or >> g]On Behalf Of Boris Steipe >> Sent: Wednesday, October 05, 2005 9:57 AM >> To: The general forum at Bioinformatics.Org >> Subject: Re: [BiO BB] In search of complete conserved genes.... >> >> >> This is what COGS was built for: >> >> http://www.ncbi.nlm.nih.gov/COG/ >> >> "Interesting" interface though. Probably the list you might want to >> work with is >> http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi >> >> >> Boris >> >> >> dwivedbz at notes.udayton.edu wrote: >> >> >> >> >>> Hello everyone! >>> I am looking for complete conserved protein-coding genes that are >>> widely distributed among bacterial species (should be present in >>> atleast 6-7 bacterial species). Also, I need such genes to show >>> high degree of sequence similarities in the species they exist. I >>> would appreciate if you could help me out. Thanks! >>> Bhakti >>> >>> -------------------------------------------------------------------- >>> - >>> - >>> -- >>> >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >>> >>> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From akarger at CGR.Harvard.edu Thu Oct 6 15:46:19 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu, 6 Oct 2005 15:46:19 -0400 Subject: [BiO BB] Re: Linear Bioinformatics workflow? Message-ID: <339D68B133EAD311971E009027DC47970391A3DD@montecarlo.cgr.harvard.edu> > Chris Dwan said: [liberally snipped and edited] Hi, Chris. > That said, if you narrow the problem enough that it doesn't have to > do everything in the world, things get a lot simpler. Exactly! We *have* a narrow problem: scripting command-line calls to bioinformatics or formatting tools. I'm narrowing it further by allowing only 1-D problems. Best of all, biologists already know how to write protocols To edit/shorten your requirements list for a linear workflow app (which we can call 1DScript until our marketing team plays with it for a while): - choose from a limited set of tools - build a protocol - save a protocol That doesn't sound too hard! In fact, the solution already exists. Use iNquiry to run a set of jobs in a row. For each job, find the command-line command that iNquiry (Pise) runs, and paste it into Notepad. Save as a shell script. Voila! If only we knew some iNquiry developers, we could ask them to integrate a command-line history, and we'd be set. Is it that hard to pop that into a GUI? > Most of the commercial and free workflow engines will do this, but it > sounds like the overhead of learning to use them is a bit much for > your users? Yes. Some of our clients will be folks who use computers once a month. Do they want to devote the time to learn how to use these workflow applications, which - since they can do so much more than 1DScript - are necessarily going to be more complex? In addition, I'm getting the impression - correct me if I'm wrong - that the usual model for Inforsense et al. is that in-house programmers create workflows which users then use. And we have very few in-house programmers. (Approximately, hm, let's see, carry the 4... um, one.) > "I have a [protein sequence | genomic region | compound | mass spec > output] and I want to find out everything in the entire world about it. Finding websites is work, but a different kind of work than learning and *remembering* (during that month in the lab) a new language/interface. In my possibly totally wrong opinion, a biologist who uses computers only occasionally is more willing to do the searching websites kind of work. So this biologist can seek out the websites, and paste a bunch of (parameterized) wgets into a protocol. > Ideally, all the result values would be converted to a common > vocabulary, format, and normalization. Just add a few Scriptome tools ( http://cgr.harvard.edu/cbg/scriptome ) to your wgets. > I would rather not go to > every website in the world, or even know about all those websites." Sorry, out of scope. Buy a programmer. Or hope that someday 1DScript protocols are online and someone wrote one that's close to what you want so you can download and tweak it. -Amir > > -Chris Dwan > > Amir Karger wrote: > > > Several people mentioned 2-D graphical workflow tool in a > > "Bioinformatics > > workflow?" thread on bioclusters. (I'm redirecting my > non-cluster-y > > question > > here.) While still a newbie, I'm getting the impression that many > > bioinformatics workflows are mostly linear, with obvious important > > exceptions like conditions and loops. For example, I had a client > > last week > > who wanted to script: > > > > 1 blast [sequence=..., program=...] > blast.out > > 2 get hits from blast.out > blast.hits > > 3 find hits with 50-70% sequence identity from blast.hits > > > blast.good_hits > > 3 download/fastacmd sequences for IDs in blast.good_hits > > hits.fasta > > 4 clustalw hits.fasta > publishable_result (OK, not really) > From charleshefer at gmail.com Fri Oct 7 03:59:40 2005 From: charleshefer at gmail.com (Charles Hefer) Date: Fri, 7 Oct 2005 09:59:40 +0200 Subject: [BiO BB] Automated upstream region sequence retrieval Message-ID: <492024630510070059p251f24ebo7c3ae4b3ce9b3088@mail.gmail.com> Hi I am looking for a way to automate the retrieval of upstream regions of genes (from fully sequenced genomes). I have tried the R/BioMart route, but the organisms I want are not available in BioMart (yet). Does one of the Bio modules of i.e Python/Java/PERL support this functionality? I want to put up a little internal web-service for promoter searches, for which the desired gene ID (GenBankId) would be entered and the ~2kb upstream region returned. Thanx, in advance -- Charles Hefer REPLY to: chefer at tuks.co.za From dmb at mrc-dunn.cam.ac.uk Thu Oct 6 13:52:40 2005 From: dmb at mrc-dunn.cam.ac.uk (dmb at mrc-dunn.cam.ac.uk) Date: Thu, 6 Oct 2005 18:52:40 +0100 (BST) Subject: [BiO BB] Most common protein fold? In-Reply-To: <93B97926-F819-43C9-ADF2-6517E7320CCB@utoronto.ca> References: <93B97926-F819-43C9-ADF2-6517E7320CCB@utoronto.ca> Message-ID: <34311.213.107.105.179.1128621160.squirrel@www.mrc-dunn.cam.ac.uk> > Hi Paul, > > In my mind it would only make sense per organism, because the number > of "known" proteins is an arbitrary subset. Wouldn't you agree? Then > again, I am not even sure the number of known genes in an organism is > all that meaningful either, because they are expressed at hugely > different levels ... is the mere presence of a gene enough to support > the type of inference you are looking for? > > For folds-per-organism it seems the SUPERFAMILY database referred to > by "dmb at mrc-dunn.cam.ac.uk" is indeed your best bet. > > Does this help? > > Boris > > > On 6 Oct 2005, at 19:03, paul wrote: > >> Hi Boris, >> >> Thanks you for this, but I was really wondering how many individual >> known >> proteins fall within each class and subclass (i.e. superfamilily, >> family, >> etc.). For example, how many individual proteins adopt the Ribulose- >> phoshate >> binding barrel fold and how are the proteins distributed between >> the four >> families? In other words, I am not trying to find out how many >> different >> clases and subclasses of protein folds we have, but rather how are >> known >> proteins (e.g. those in Swiss-Prot or PDB for example) distributed >> amoungst >> the various folds? Hope this makes sense?! Yes it does make sense. You will find a highly scale free distribution, with many proteins being described by only a few folds, with a large number of 'unique' folds to boot. For example, see... http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=7990952&dopt=Citation Title: Protein superfamilies and domain superfolds. Abstract: As the protein sequence and structure databases expand rapidly a better understanding of the relationships between proteins is required. A classification is considered that extends the sequence-based superfamilies to include proteins with similar function and three-dimensional structures but no sequence similarity. So far there are only nine protein folds known to recur in proteins having neither sequence nor functional similarity. These folds dominate the structure database, representing more than 30 per cent of all determined structures. This observation has implications for protein-fold recognition. >> Best Regards, >> >> Paul >> >> -----Original Message----- >> From: bio_bulletin_board-bounces >> +p.curley=wmin.ac.uk at bioinformatics.org >> [mailto:bio_bulletin_board-bounces >> +p.curley=wmin.ac.uk at bioinformatics.or >> g]On Behalf Of Boris Steipe >> Sent: Thursday, October 06, 2005 5:48 AM >> To: The general forum at Bioinformatics.Org >> Subject: Re: [BiO BB] Most common protein fold? >> >> >> The SCOP help-file at http://scop.mrc-lmb.cam.ac.uk/scop/help.html >> Has the following to say: >> >> "The number in parenthesis after an entry shows how many children >> will be found there." >> >> So for example the TIM b/a barrel Fold >> ----- TIM beta/alpha-barrel [51350] (31) >> has 31 superfamilies and its Ribulose-phosphate binding barrel >> ---------- Ribulose-phoshate binding barrel [51366] (4) >> has 4 families. >> >> Hope this is what you were looking for >> >> Boris >> ========================================== >> >> On 6 Oct 2005, at 13:46, paul wrote: >> >> >>> Hi Folks, >>> >>> Quick question. Does anyone know by any chance know how I can find >>> the >>> number of individual proteins within >>> each superfamily and family of the SCOP database to get an idea of >>> which >>> folds are the most >>> common and which are very rare? >>> >>> Any help much appreciated. >>> >>> Best Regards, >>> >>> Paul >>> >>> -----Original Message----- >>> From: bio_bulletin_board-bounces >>> +p.curley=wmin.ac.uk at bioinformatics.org >>> [mailto:bio_bulletin_board-bounces >>> +p.curley=wmin.ac.uk at bioinformatics.or >>> g]On Behalf Of Boris Steipe >>> Sent: Wednesday, October 05, 2005 9:57 AM >>> To: The general forum at Bioinformatics.Org >>> Subject: Re: [BiO BB] In search of complete conserved genes.... >>> >>> >>> This is what COGS was built for: >>> >>> http://www.ncbi.nlm.nih.gov/COG/ >>> >>> "Interesting" interface though. Probably the list you might want to >>> work with is >>> http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi >>> >>> >>> Boris >>> >>> >>> dwivedbz at notes.udayton.edu wrote: >>> >>> >>> >>> >>>> Hello everyone! >>>> I am looking for complete conserved protein-coding genes that are >>>> widely distributed among bacterial species (should be present in >>>> atleast 6-7 bacterial species). Also, I need such genes to show >>>> high degree of sequence similarities in the species they exist. I >>>> would appreciate if you could help me out. Thanks! >>>> Bhakti >>>> >>>> -------------------------------------------------------------------- >>>> - >>>> - >>>> -- >>>> >>>> _______________________________________________ >>>> Bioinformatics.Org general forum - >>>> BiO_Bulletin_Board at bioinformatics.org >>>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From idoerg at burnham.org Sat Oct 8 21:39:25 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat, 08 Oct 2005 18:39:25 -0700 Subject: [BiO BB] Sequence database errors Message-ID: <434874CD.40406@burnham.org> Hi, Is there any recent study regarding the scope of annotation errors in sequnece databases? Especially functional annotations? Something in the spirit of: Peer Bork: Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res. 2000 Apr;10(4):398-400 Thanks, Iddo -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9930 http://ffas.ljcrf.edu/~iddo From bioinfosm at gmail.com Sun Oct 9 16:44:12 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Sun, 9 Oct 2005 15:44:12 -0500 Subject: [BiO BB] Orthologous upstream sequence analysis Message-ID: <726450810510091344w5a5b3683kb91842f9da5e9367@mail.gmail.com> Hii... I am looking for various options to analyze a set of 300bp long orthologous sequences from a group of phylogenetically related species. Its like 5-8 promoter dna sequences per gene for around 2000 genes. What can be the best ways to do this analysis, I looked at some phylogenetic tools, but most have limitations like web-based, just 2 sequences, etc. All your suggestions... and anyone experienced with similar stuff. ... I welcome it all... Thanks. Samantha -------------- next part -------------- An HTML attachment was scrubbed... URL: From prathibha_562 at yahoo.co.in Mon Oct 10 00:46:33 2005 From: prathibha_562 at yahoo.co.in (prathibha bharathi) Date: Mon, 10 Oct 2005 05:46:33 +0100 (BST) Subject: [BiO BB] Plz Help! Message-ID: <20051010044634.51970.qmail@web8403.mail.in.yahoo.com> Hi All, will anybody of u plz suggest me how to unsubscribe . I don't want to continue being in this group .These mails are constantly bothering me! Plz help! Regards, Prathibha. --------------------------------- Yahoo! India Matrimony: Find your partner now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanielager at fastmail.ca Mon Oct 10 08:16:17 2005 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Mon, 10 Oct 2005 12:16:17 +0000 (UTC) Subject: [BiO BB] Sequence database errors In-Reply-To: <434874CD.40406@burnham.org> Message-ID: <20051010121617.0993A861AA7@mail.interchange.ca> Maybe you can find something among the 28 articles that have cited the article by Bork http://scholar.google.com/scholar?hl=en&lr=&safe=off&q=link:9_-RxxRgGTkJ:scholar.google.com/ Stefanie > Hi, > > Is there any recent study regarding the scope of annotation errors in > sequnece databases? Especially functional annotations? Something in > the spirit of: > > > Peer Bork: Powers and pitfalls in sequence analysis: the 70% hurdle. > Genome Res. 2000 Apr;10(4):398-400 > > Thanks, > > Iddo > > -- > > Iddo Friedberg, Ph.D. > Burnham Institute for Medical Research > 10901 N. Torrey Pines Rd. > La Jolla, CA 92037 > Tel: (858) 646 3100 x3516 > Fax: (858) 713 9930 > http://ffas.ljcrf.edu/~iddo > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From jeff at bioinformatics.org Mon Oct 10 08:49:39 2005 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Mon, 10 Oct 2005 08:49:39 -0400 Subject: [BiO BB] Plz Help! In-Reply-To: <20051010044634.51970.qmail@web8403.mail.in.yahoo.com> References: <20051010044634.51970.qmail@web8403.mail.in.yahoo.com> Message-ID: <434A6363.3050009@bioinformatics.org> Anyone wishing to unsubscribe would be advised to follow the same path they took to subscribe: https://bioinformatics.org/mailman/listinfo/bio_bulletin_board This link is at the bottom of every message coming from the list server. Also, you can refer back to the subscription confirmation email that you got after subscribing, or you can write to the list administrators: bio_bulletin_board-admin at bioinformatics.org You can even write to the Organization's administrators: sysadmins at bioinformatics.org In any event, *please* don't write to everyone on the mailing list about this! :-) There are more than a thousand people on this list who, like yourself, would rather not be bothered with certain messages. Cheers, Jeff prathibha bharathi wrote: > Hi All, > > will anybody of u plz suggest me how to unsubscribe . I don't want to > continue being in this group .These mails are constantly bothering me! > > Plz help! > > Regards, > Prathibha. -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- From P.Curley at westminster.ac.uk Mon Oct 10 12:13:57 2005 From: P.Curley at westminster.ac.uk (paul) Date: Mon, 10 Oct 2005 09:13:57 -0700 Subject: [BiO BB] Most common protein fold? In-Reply-To: <93B97926-F819-43C9-ADF2-6517E7320CCB@utoronto.ca> Message-ID: Hi Boris, Thanks for this. I agree that the "known" proteins are an arbitary subset, but I just wanted to get a feel of how the protein universe is currently populated - whether this reflects proteins as a wholw, you can tell?! I will check out the superfamily database as you suggest nad have a poke around. Thanks to everyone who suggested advice and sorry for not thanking you all sooner, but I was away at the end of last week. Best Regards, Paul -----Original Message----- From: bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.org [mailto:bio_bulletin_board-bounces+p.curley=wmin.ac.uk at bioinformatics.or g]On Behalf Of Boris Steipe Sent: Thursday, October 06, 2005 9:24 AM To: The general forum at Bioinformatics.Org Subject: Re: [BiO BB] Most common protein fold? Hi Paul, In my mind it would only make sense per organism, because the number of "known" proteins is an arbitrary subset. Wouldn't you agree? Then again, I am not even sure the number of known genes in an organism is all that meaningful either, because they are expressed at hugely different levels ... is the mere presence of a gene enough to support the type of inference you are looking for? For folds-per-organism it seems the SUPERFAMILY database referred to by "dmb at mrc-dunn.cam.ac.uk" is indeed your best bet. Does this help? Boris On 6 Oct 2005, at 19:03, paul wrote: > Hi Boris, > > Thanks you for this, but I was really wondering how many individual > known > proteins fall within each class and subclass (i.e. superfamilily, > family, > etc.). For example, how many individual proteins adopt the Ribulose- > phoshate > binding barrel fold and how are the proteins distributed between > the four > families? In other words, I am not trying to find out how many > different > clases and subclasses of protein folds we have, but rather how are > known > proteins (e.g. those in Swiss-Prot or PDB for example) distributed > amoungst > the various folds? Hope this makes sense?! > > Best Regards, > > Paul > > -----Original Message----- > From: bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.org > [mailto:bio_bulletin_board-bounces > +p.curley=wmin.ac.uk at bioinformatics.or > g]On Behalf Of Boris Steipe > Sent: Thursday, October 06, 2005 5:48 AM > To: The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] Most common protein fold? > > > The SCOP help-file at http://scop.mrc-lmb.cam.ac.uk/scop/help.html > Has the following to say: > > "The number in parenthesis after an entry shows how many children > will be found there." > > So for example the TIM b/a barrel Fold > ----- TIM beta/alpha-barrel [51350] (31) > has 31 superfamilies and its Ribulose-phosphate binding barrel > ---------- Ribulose-phoshate binding barrel [51366] (4) > has 4 families. > > Hope this is what you were looking for > > Boris > ========================================== > > On 6 Oct 2005, at 13:46, paul wrote: > > >> Hi Folks, >> >> Quick question. Does anyone know by any chance know how I can find >> the >> number of individual proteins within >> each superfamily and family of the SCOP database to get an idea of >> which >> folds are the most >> common and which are very rare? >> >> Any help much appreciated. >> >> Best Regards, >> >> Paul >> >> -----Original Message----- >> From: bio_bulletin_board-bounces >> +p.curley=wmin.ac.uk at bioinformatics.org >> [mailto:bio_bulletin_board-bounces >> +p.curley=wmin.ac.uk at bioinformatics.or >> g]On Behalf Of Boris Steipe >> Sent: Wednesday, October 05, 2005 9:57 AM >> To: The general forum at Bioinformatics.Org >> Subject: Re: [BiO BB] In search of complete conserved genes.... >> >> >> This is what COGS was built for: >> >> http://www.ncbi.nlm.nih.gov/COG/ >> >> "Interesting" interface though. Probably the list you might want to >> work with is >> http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi >> >> >> Boris >> >> >> dwivedbz at notes.udayton.edu wrote: >> >> >> >> >>> Hello everyone! >>> I am looking for complete conserved protein-coding genes that are >>> widely distributed among bacterial species (should be present in >>> atleast 6-7 bacterial species). Also, I need such genes to show >>> high degree of sequence similarities in the species they exist. I >>> would appreciate if you could help me out. Thanks! >>> Bhakti >>> >>> -------------------------------------------------------------------- >>> - >>> - >>> -- >>> >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >>> >>> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From marty.gollery at gmail.com Mon Oct 10 12:09:55 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Mon, 10 Oct 2005 09:09:55 -0700 Subject: [BiO BB] Sequence database errors In-Reply-To: <434874CD.40406@burnham.org> References: <434874CD.40406@burnham.org> Message-ID: Hi Iddo, I recall that Steven Brenner at Berkeley did an analysis about 4-5 years ago. He was simply comparing the differences in annotation of comparable databases, (which were considerable), thus avoiding the problem of figuring out what the 'true' answer was. Marty On 10/8/05, Iddo Friedberg wrote: > > Hi, > > Is there any recent study regarding the scope of annotation errors in > sequnece databases? Especially functional annotations? Something in the > spirit of: > > > Peer Bork: Powers and pitfalls in sequence analysis: the 70% hurdle. > Genome Res. 2000 Apr;10(4):398-400 > > Thanks, > > Iddo > > -- > > Iddo Friedberg, Ph.D. > Burnham Institute for Medical Research > 10901 N. Torrey Pines Rd. > La Jolla, CA 92037 > Tel: (858) 646 3100 x3516 > Fax: (858) 713 9930 > http://ffas.ljcrf.edu/~iddo > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.gille at charite.de Tue Oct 11 06:06:59 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 11 Oct 2005 12:06:59 +0200 (CEST) Subject: [BiO BB] oligodb is updated Message-ID: <36679.192.168.220.203.1129025219.squirrel@webmail.charite.de> We had been asked to update oligodb. Oligodb is a web-based system for interactive design of specific oligo DNA for transcription profiling (hybridization) of genes. URL: http://bioinf.charite.de/oligodb/ Current database version: Ensembl 09/2005 Supported organisms: men, mouse, drosophila From christoph.gille at charite.de Tue Oct 11 10:43:16 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 11 Oct 2005 16:43:16 +0200 (CEST) Subject: [BiO BB] contents of co factores in E coli Message-ID: <43018.192.168.220.203.1129041796.squirrel@webmail.charite.de> Does anybody know how much FAD, NADPH , NADH etc. there is in an E coli cell ? Many thanks From dmb at mrc-dunn.cam.ac.uk Tue Oct 11 13:37:30 2005 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue, 11 Oct 2005 18:37:30 +0100 Subject: [BiO BB] Reaction with missing EC number? Message-ID: <434BF85A.7080506@mrc-dunn.cam.ac.uk> Hi, looking at Kegg (Ligand) I find the following reaction... R00399 succinyl-CoA:acetyl-CoA C-succinyltransferase L-Alanine + Acetyl-CoA <=> 2-Amino-4-oxopentanoic acid + CoA However, the EC number for this reaction appears to be "EC 2.3.1.-", i.e. not a proper EC number. The weird thing is that this EC number *is* used in the Kegg Valine, leucine and isoleucine degradation pathway. I thought one way that this could come about is if the reaction itself hasn't yet been assigned an EC number but is otherwise OK, and is used with the 'temporary/approximate' EC number in place of the yet to be assigned (correct) one. Anybody know if that is correct? Seems strange to see an approximate EC number in a step of a pathway with known inputs and outputs. Although looking at the data it doesn't appear to be uncommon. Anyone have any information on the use of these approximate numbers at KEGG? Cheers, Dan. From michal at bio-world.com Wed Oct 12 10:29:56 2005 From: michal at bio-world.com (michal) Date: Wed, 12 Oct 2005 10:29:56 -0400 Subject: [BiO BB] Lab software Message-ID: <001801c5cf39$6acb6fd0$c801a8c0@bioworld.com> Hello everyone. I have quick question: Is there an open source package similar to the one at bioroot.org for controling the lab ? Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From adeyanjufunke at yahoo.com Mon Oct 10 13:50:50 2005 From: adeyanjufunke at yahoo.com (adeyanju funke) Date: Mon, 10 Oct 2005 10:50:50 -0700 (PDT) Subject: [BiO BB] bioinformatics Message-ID: <20051010175051.34749.qmail@web54610.mail.yahoo.com> hello there it's a really great work you are doing. thanks you foradvicing young folks like me. i found out about your advisory board and i decided to write you this may be you will be able to help. the problem is this i would really love to study bioinformatics, i had a second class lower degree in biochemistry with a GP of 2.91. please i need a school to do bioinformatics for masters please help me out. thanks kind regards, Adefunke __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From maria.mirto at unile.it Wed Oct 12 05:21:48 2005 From: maria.mirto at unile.it (Maria Mirto) Date: Wed, 12 Oct 2005 11:21:48 +0200 (CEST) Subject: [BiO BB] CfP: FGCS - Special Issue on Life Science Grids for Biomedicine and Bioinformatics Message-ID: <3152.193.204.86.212.1129108908.squirrel@webmail2.unile.it> **************************************************************************** * Call for papers for the Special Issue * * * * Life Science Grids for Biomedicine and Bioinformatics * * * * Future Generation Computer System * * Elsevier * * http://www.elsevier.com/inca/publications/misc/lifesciencegrid05.doc * * * *************************************************************************** ########################################################################## IMPORTANT DATES Submission for manuscripts: December 15, 2005 Acceptance notification: January 28, 2006 Due date of revised manuscripts: February 28, 2006 Approximate date of publication: Spring, 2006 ######################################################################### Purpose of the Special Issue ---------------------------- Omics technologies (genomics - DNA, transcriptomics - RNA, proteomics - protein, metabolomics - metabolite and phenomics - phenotype, etc.) and medical informatics have changed the arena of life sciences research forever. They allow generation of data at a large-scale, which started with the whole-genome followed by micro-array gene-expression analysis, mass spectrometry of proteins and metabolites, biomedical imaging processing and health care. Omics technologies require substantial paradigm shifts for the way life sciences research is carried out. Biological experiments are relatively expensive, which forces scientists to focus on advanced design for experimentation as part of a whole-chain research approach. Furthermore, the data generally contains information outside the scope of the original experiment. Hence, to maximize scope of experiments, biological data needs to be reusable, shareable and suitable for in-silico experiments. All of this poses high demands on annotation of data and standardization of data formats. Furthermore, the conversion of data into information and knowledge to support scientists answering biological questions, requires advanced analysis methods and tools that enable mining and integrating these complex datasets. The bottlenecks for life sciences have shifted from data generation to data storage, pre-processing, analysis, and interpretation. The current challenge is to remove these bottlenecks by a combination of life sciences and information technology (IT). The effective and efficient management and use of stored data, and in particular the transformation of these data into information and knowledge, is thus a key requirement for success in Life Sciences, as already has been recognized in many others sectors such as industry, science, government. Life Science Grids are based on the integration of emerging technologies such as Grids, Bioinformatics, Web/Grid Services, Workflow, Semantic Web, to support applications and research in different fields of Life Sciences, such as Health Care, Biomedicine, Computational Chemistry. They promise to provide reliable and secure computing infrastructures facilitating the seamless use of distributed datasets, bioinformatics tools and systems, data mining applications, and knowledge, building a so-called Grid Problem Solving Environment (G-PSE), for solving complex problems in Biomedicine and Health Care. The scope of this special issue is to focus on challenge, applications and services in modern Life Science Grid computing environments. Topics of interest include, but are not limited to the following: - Grid solutions for Life Science applications - Grid infrastructures for bio data analysis - Parallel bio data-intensive applications - Grid infrastructures, middleware and tools for Life Science Grids - Web Services for Life Science Grids - Workflow for Life Science Grids - Semantic Grid for Life Science applications - Bio data analysis and management - Databases and the grid in biomedical field - Data grids for biomedicine and bioinformatics - Data mining of truly large and high-dimensional bio data sets - Security in bio data grids - Biology, Biochemistry and Biomedicine for Grid Environments * Drug Design * Protein Folding * Systems Biology * Genome informatics and phylogeny Guest Editors ------------- Giovanni Aloisio University of Lecce, Italy giovanni.aloisio at unile.it Vincent Breton CNRS/IN2P3, LPC Clermont-Ferrand, France breton at clermont.in2p3.fr Maria Mirto University of Lecce, Italy maria.mirto at unile.it Almerico Murli University of Naples, Italy almerico.murli at dma.unina.it Tony Solomonides University of West of England, UK Tony.Solomonides at uwe.ac.uk Important Dates --------------- Paper submission deadline December 15, 2005 Notification of acceptance January 28, 2006 Camera-ready papers February 28, 2006 Desired publication End of 2006 -- ============================================================ Maria Mirto Center for Advanced Computational Technologies via per Monteroni, 73100 Lecce (Le), ITALY SPACI s.r.l. ph: +39 0832 297304 fax: +39 0832 297279 ============================================================ From cjoy at houston.rr.com Sun Oct 16 04:21:30 2005 From: cjoy at houston.rr.com (Corwin Joy) Date: Sun, 16 Oct 2005 03:21:30 -0500 Subject: [BiO BB] ANN: BIOLAP Message-ID: <000b01c5d22a$9ce57530$3201bf0a@cjoyxp> We have just released a new project to sourceforge focused on OLAP tools for biology data. http://biolap.sourceforge.net/ What is BIOLAP? OLAP is a powerful tool that is widely used by the business community to analyze large financial data sets. Despite the big data sets found in modern biology, OLAP has not been widely adopted by the biology community. We want to change that. BIOLAP is open source OLAP for biology data. We take the open source tools provided by the JPivot project, and extend them to handle biology data types. In our first release, we focus on extensions to analyze large genomic sequence databases. As an application we apply these tools to browse and analyze the iProClass database of over over 2m protein sequences. From atariml at gmail.com Sun Oct 16 12:00:31 2005 From: atariml at gmail.com (Andrea Franceschini) Date: Sun, 16 Oct 2005 18:00:31 +0200 Subject: [BiO BB] protein family databases (Interpro) Message-ID: <031f01c5d26a$c16cda70$0801a8c0@atarippc> We are developing a software to automatically retrive some family/domain's annotations of a set of proteins. In particular we are interested to identify the possible relations between the different families. To accomplish this task we are thinking to use Interpro as source for the annotations. What do you think about the parent/child and contains/found in relations present in Interpro ? Do you have any suggestion about other databases that we should use ? Do you have any suggestion about other possible information that we could provide on a particular set of proteins (given us as input by the user) ? Thankyou very much Andrea Franceschini University Politecnico of Milan (Italy) From aloraine at gmail.com Sun Oct 16 20:32:04 2005 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 16 Oct 2005 19:32:04 -0500 Subject: [BiO BB] protein family databases (Interpro) In-Reply-To: <031f01c5d26a$c16cda70$0801a8c0@atarippc> References: <031f01c5d26a$c16cda70$0801a8c0@atarippc> Message-ID: <83722dde0510161732u2b19bc41hf9b4b36d784524db@mail.gmail.com> Dear Andrea, This service would be of great use to me if it allowed me to transfer amino acid sequences directly to your server and get back the results directly (instead of via a Web page) in an easy-to-parse format, such as XML. Very best wishes, Ann Loraine On 10/16/05, Andrea Franceschini wrote: > We are developing a software to automatically retrive some family/domain's > annotations of a set of proteins. > In particular we are interested to identify the possible relations between > the different families. > > To accomplish this task we are thinking to use Interpro as source for the > annotations. > What do you think about the parent/child and contains/found in relations > present in Interpro ? > > Do you have any suggestion about other databases that we should use ? > Do you have any suggestion about other possible information that we could > provide on a particular set of proteins (given us as input by the user) ? > > > Thankyou very much > Andrea Franceschini > University Politecnico of Milan (Italy) > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From ccottap at lcc.uma.es Mon Oct 17 05:30:41 2005 From: ccottap at lcc.uma.es (Carlos Cotta) Date: Mon, 17 Oct 2005 11:30:41 +0200 Subject: [BiO BB] Final CFP: EvoBIO 2006 Message-ID: <5.1.0.14.2.20051017112942.03d85350@sol10.lcc.uma.es> ---------------------------------------------------------------- Final Call for Papers EvoBIO2006 Fourth European Workshop on Evolutionary Computation and Machine Learning in Bioinformatics Budapest, Hungary http://evonet.lri.fr/eurogp2006/?page=evobio http://www.cs.vu.nl/~evobio/evobio06.html ---------------------------------------------------------------- EvoBIO covers research in all aspects of computational intelligence in bioinformatics and computational biology. The emphasis is on algorithms based on evolutionary computation, on neural networks and on other novel optimisation and machine learning methods, that address important problems in molecular biology, genomics and genetics, that are computationally efficient, and that have been implemented and tested in simulations and on real datasets. The goal of the workshop is to present recent research results, including significant work-in-progress, and to identify and explore directions of future research, besides stimulating closer interaction between members of this scientific community working on bioinformatics. Each accepted paper will be presented orally at the workshop and printed in the proceedings published by Springer in the LNCS series. The accepted papers of the first, second, and third edition of EvoBIO were published in the Springer Verlag LNCS 2611, 3005, 3449, respectively. Submission instructions are available at the EvoBIO web pages indicated above. ****** Important Dates ****** Submission deadline: 4 November 2005 Notification of acceptance: 12 December 2005 Camera ready papers due: 9 January 2006 EvoBIO and Evo-events: 10-12 April 2006 ****** General Chairs ******* Dave Corne, UK Elena Marchiori, NL ****** Program Chairs ****** Carlos Cotta, Spain Jason Moore, USA ****** Publicity Chair ****** Jagath C. Rajapakse, Singapore ****** Program Committee ****** Jesus Aguilar, Spain Wolfgang Banzhaf, Canada Jacek Blazewicz, Poland David Corne, UK Vincenzo Cutello, Italy Gary Fogel, USA James Foster, USA Alex Freitas, UK Raul Giraldez, Spain Rosalba Giugno, Italy Jin-Kao Hao, France Natalio Krasnogor, UK Bill Langdon, UK Bob MacCallum, Sweden Elena Marchiori, The Netherlands Andrew Martin, UK Pablo Moscato, Australia Ajit Narayanan, UK Vic J. Rayward-Smith, UK John Rowe, UK Jem Rowland, UK El-Ghazali Talbi, France Antoine van Kampen, The Netherlands Gwen Volkert, UK Ray Walshe, Ireland Eckart Zitzler, Switzerland Igor Zwir, Spain From gully at usc.edu Mon Oct 17 17:52:04 2005 From: gully at usc.edu (Gully Burns) Date: Mon, 17 Oct 2005 14:52:04 -0700 Subject: [BiO BB] NeuroScholar - an open-source informatics system for managing the scientific literature Message-ID: <0IOI0079OY30CX90@msg-mx2.usc.edu> ANNOUNCEMENT: The NeuroScholar system is a knowledge management system for the neuroscientific literature, allowing users to build an organized library of PDF files and then make and manage free-form notes based on the articles. This simple functionality is the first phase of the creation of a system that enables bench-neuroscientists to construct knowledge bases of what they know. This is an attempt to introduce an informatics framework into the lab to facilitate to an increased level of formalism to the subject. NeuroScholar is an open-source and is free for download from http://www.neuroscholar.org/ (click on Software > NeuroScholar ). We also have several demonstration movies available from the movies section to show the functionality of the system. Thank you for your attention. Gully Burns Research Assistant Professor Univeristy of Southern California -------------- next part -------------- An HTML attachment was scrubbed... URL: From marty.gollery at gmail.com Mon Oct 17 18:10:08 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Mon, 17 Oct 2005 15:10:08 -0700 Subject: [BiO BB] protein family databases (Interpro) In-Reply-To: <031f01c5d26a$c16cda70$0801a8c0@atarippc> References: <031f01c5d26a$c16cda70$0801a8c0@atarippc> Message-ID: Interpro would be a good choice because it is becoming so popular for genome annotation projects. Marty On 10/16/05, Andrea Franceschini wrote: > > We are developing a software to automatically retrive some family/domain's > annotations of a set of proteins. > In particular we are interested to identify the possible relations between > the different families. > > To accomplish this task we are thinking to use Interpro as source for the > annotations. > What do you think about the parent/child and contains/found in relations > present in Interpro ? > > Do you have any suggestion about other databases that we should use ? > Do you have any suggestion about other possible information that we could > provide on a particular set of proteins (given us as input by the user) ? > > > Thankyou very much > Andrea Franceschini > University Politecnico of Milan (Italy) > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From jforment at ibmcp.upv.es Tue Oct 18 05:41:02 2005 From: jforment at ibmcp.upv.es (Javier Forment) Date: Tue, 18 Oct 2005 11:41:02 +0200 Subject: [BiO BB] oligo probe design Message-ID: <4354C32E.6070500@ibmcp.upv.es> Hello,... I am entering into the field of oligonucleotide probe design for the construction of oligo microarrays, and have been flooded by the huge amount and diversity of specific software for this task. Could any of you tell me about the most used/convenient/configurable one? We are also planning to install it as a web server, so that user will be able to enter the sequences of the genes to be probed, and the sequences of all the (known) genes of the organism of interest, in order to minimize the cross-hybridization. Thanks in advance, Javier. -- Javier Forment Millet Unidad de Bioinformatica del Laboratorio de Genomica Instituto de Biologia Molecular y Celular de Plantas Universidad Politecnica de Valencia Avenida de los Naranjos, s/n 46022 Valencia (Spain) Tlf.(1): +34-963877885 Tlf.(2): 685142553 FAX: +34-963877859 e-mail: jforment at ibmcp.upv.es From nmulder at science.uct.ac.za Tue Oct 18 06:26:06 2005 From: nmulder at science.uct.ac.za (Dr Nicky Mulder) Date: Tue, 18 Oct 2005 12:26:06 +0200 Subject: [BiO BB] protein family databases (Interpro) In-Reply-To: <20051017160020.49AC53685FA@primary.bioinformatics.org> References: <20051017160020.49AC53685FA@primary.bioinformatics.org> Message-ID: <1129631166.4354cdbeefb80@webmail.uct.ac.za> Dear Andrea I run the InterPro project and hope I can help with some more information. As you probably know, all the data is freely available for download, and is available in xml format. If you are interested in UniProt proteins, all InterPro matches for these are precaculated and available. InterPro parent/child relationships are used when some protein signatures are more sensistive than others and detect subfamilies, or subsets of others. In this way you can classify a protein on different levels e.g. superfamily, family and subfamily. PRINTS fingerprints are particularly good at getting different levels of granularity. The contains/found in relationship is where you have a signature covering a larger region of sequence and smaller signatures or domains covering smaller regions within it. This is to show domain composition. If you want to run your own sequences through InterProScan, the results are available in several formats, including html, text and xml. Let me know if you have any more questions on the data or database. Nicky Quoting bio_bulletin_board-request at bioinformatics.org: > Send BiO_Bulletin_Board mailing list submissions to > bio_bulletin_board at bioinformatics.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > or, via email, send a message with subject or body 'help' to > bio_bulletin_board-request at bioinformatics.org > > You can reach the person managing the list at > bio_bulletin_board-owner at bioinformatics.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of BiO_Bulletin_Board digest..." > > > Today's Topics: > > 1. protein family databases (Interpro) (Andrea Franceschini) > 2. Re: protein family databases (Interpro) (Ann Loraine) > 3. Final CFP: EvoBIO 2006 (Carlos Cotta) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 16 Oct 2005 18:00:31 +0200 > From: "Andrea Franceschini" > Subject: [BiO BB] protein family databases (Interpro) > To: "biobulletin" > Message-ID: <031f01c5d26a$c16cda70$0801a8c0 at atarippc> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > We are developing a software to automatically retrive some family/domain's > annotations of a set of proteins. > In particular we are interested to identify the possible relations between > the different families. > > To accomplish this task we are thinking to use Interpro as source for the > annotations. > What do you think about the parent/child and contains/found in relations > present in Interpro ? > > Do you have any suggestion about other databases that we should use ? > Do you have any suggestion about other possible information that we could > provide on a particular set of proteins (given us as input by the user) ? > > > Thankyou very much > Andrea Franceschini > University Politecnico of Milan (Italy) > > > > ------------------------------ > > Message: 2 > Date: Sun, 16 Oct 2005 19:32:04 -0500 > From: Ann Loraine > Subject: Re: [BiO BB] protein family databases (Interpro) > To: "The general forum at Bioinformatics.Org" > > Message-ID: > <83722dde0510161732u2b19bc41hf9b4b36d784524db at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Dear Andrea, > > This service would be of great use to me if it allowed me to transfer > amino acid sequences directly to your server and get back the results > directly (instead of via a Web page) in an easy-to-parse format, such > as XML. > > Very best wishes, > > Ann Loraine > > On 10/16/05, Andrea Franceschini wrote: > > We are developing a software to automatically retrive some family/domain's > > annotations of a set of proteins. > > In particular we are interested to identify the possible relations between > > the different families. > > > > To accomplish this task we are thinking to use Interpro as source for the > > annotations. > > What do you think about the parent/child and contains/found in > relations > > present in Interpro ? > > > > Do you have any suggestion about other databases that we should use ? > > Do you have any suggestion about other possible information that we could > > provide on a particular set of proteins (given us as input by the user) ? > > > > > > Thankyou very much > > Andrea Franceschini > > University Politecnico of Milan (Italy) > > > > _______________________________________________ > > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > ------------------------------ > > Message: 3 > Date: Mon, 17 Oct 2005 11:30:41 +0200 > From: Carlos Cotta > Subject: [BiO BB] Final CFP: EvoBIO 2006 > To: bio_bulletin_board at bioinformatics.org > Message-ID: <5.1.0.14.2.20051017112942.03d85350 at sol10.lcc.uma.es> > Content-Type: text/plain; charset=us-ascii; format=flowed > > ---------------------------------------------------------------- > > Final Call for Papers > > EvoBIO2006 > Fourth European Workshop on Evolutionary Computation > and Machine Learning in Bioinformatics > Budapest, Hungary > > http://evonet.lri.fr/eurogp2006/?page=evobio > http://www.cs.vu.nl/~evobio/evobio06.html > > ---------------------------------------------------------------- > > EvoBIO covers research in all aspects of computational intelligence > in bioinformatics and computational biology. The emphasis is on > algorithms based on evolutionary computation, on neural networks and > on other novel optimisation and machine learning methods, that address > important problems in molecular biology, genomics and genetics, that > are computationally efficient, and that have been implemented and > tested in simulations and on real datasets. > > The goal of the workshop is to present recent research results, > including significant work-in-progress, and to identify and explore > directions of future research, besides stimulating closer interaction > between members of this scientific community working on bioinformatics. > > Each accepted paper will be presented orally at the workshop and > printed in the proceedings published by Springer in the LNCS series. > The accepted papers of the first, second, and third edition of EvoBIO > were published in the Springer Verlag LNCS 2611, 3005, 3449, respectively. > > Submission instructions are available at the EvoBIO web pages > indicated above. > > ****** Important Dates ****** > > Submission deadline: 4 November 2005 > Notification of acceptance: 12 December 2005 > Camera ready papers due: 9 January 2006 > EvoBIO and Evo-events: 10-12 April 2006 > > ****** General Chairs ******* > Dave Corne, UK > Elena Marchiori, NL > > ****** Program Chairs ****** > Carlos Cotta, Spain > Jason Moore, USA > > ****** Publicity Chair ****** > Jagath C. Rajapakse, Singapore > > ****** Program Committee ****** > Jesus Aguilar, Spain > Wolfgang Banzhaf, Canada > Jacek Blazewicz, Poland > David Corne, UK > Vincenzo Cutello, Italy > Gary Fogel, USA > James Foster, USA > Alex Freitas, UK > Raul Giraldez, Spain > Rosalba Giugno, Italy > Jin-Kao Hao, France > Natalio Krasnogor, UK > Bill Langdon, UK > Bob MacCallum, Sweden > Elena Marchiori, The Netherlands > Andrew Martin, UK > Pablo Moscato, Australia > Ajit Narayanan, UK > Vic J. Rayward-Smith, UK > John Rowe, UK > Jem Rowland, UK > El-Ghazali Talbi, France > Antoine van Kampen, The Netherlands > Gwen Volkert, UK > Ray Walshe, Ireland > Eckart Zitzler, Switzerland > Igor Zwir, Spain > > > > > > ------------------------------ > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > End of BiO_Bulletin_Board Digest, Vol 12, Issue 14 > ************************************************** > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From yazhang at eecs.ku.edu Tue Oct 18 11:31:48 2005 From: yazhang at eecs.ku.edu (Ya Zhang) Date: Tue, 18 Oct 2005 10:31:48 -0500 (CDT) Subject: [BiO BB] Post-Doc in BIOINFORMATICS/MACHINE LEARNING @ University of Kansas Message-ID: The University of Kansas Information and Telecommunications Laboratory has one opening for a postdoctoral researcher for a one year appointment. Duties are to perform research in machine learning and bioinformatics as assigned by the supervisor and work in an interdisciplinary research team; write reports, papers, and presentations describing the work; other duties as assigned. Required qualifications are earned doctorate in computer science, bioinformatics, statistics, or a closely related scientific discipline; research experience in machine learning/statistical learning; able to program in at least one of the languages: MATLAB, C, and C++. Preferred qualifications are research experience in bioinformatics; published papers in machine learning/bioinformatics areas, and familiarity with graph theories and network modeling. For further information contact: Prof. Anne Zhang Phone: (785)864-7386 Email: yazhang at eecs.ku.edu To apply, complete on-line application at https://jobs.ku.edu, and attach cover letter, resume, and names and contact information for three references. Review of applications will begin on November 1, 2005. EO/AA employer. Paid for by KU. From afsanehmotamed at yahoo.com Tue Oct 18 11:39:09 2005 From: afsanehmotamed at yahoo.com (Afsaneh Motamed-Khorasani) Date: Tue, 18 Oct 2005 11:39:09 -0400 (EDT) Subject: [BiO BB] Post-Doc in BIOINFORMATICS/MACHINE LEARNING @ University of Kansas In-Reply-To: Message-ID: <20051018153909.94746.qmail@web88102.mail.re2.yahoo.com> Dear Dr. Zhang, Thank you for the reply but I prefer to saty in Toronto for the postdoctoral period for family reasons. So kindly remember me if you may have any position or hear of any related position in Toronto area. Thanks, Afsaneh Ya Zhang wrote: The University of Kansas Information and Telecommunications Laboratory has one opening for a postdoctoral researcher for a one year appointment. Duties are to perform research in machine learning and bioinformatics as assigned by the supervisor and work in an interdisciplinary research team; write reports, papers, and presentations describing the work; other duties as assigned. Required qualifications are earned doctorate in computer science, bioinformatics, statistics, or a closely related scientific discipline; research experience in machine learning/statistical learning; able to program in at least one of the languages: MATLAB, C, and C++. Preferred qualifications are research experience in bioinformatics; published papers in machine learning/bioinformatics areas, and familiarity with graph theories and network modeling. For further information contact: Prof. Anne Zhang Phone: (785)864-7386 Email: yazhang at eecs.ku.edu To apply, complete on-line application at https://jobs.ku.edu, and attach cover letter, resume, and names and contact information for three references. Review of applications will begin on November 1, 2005. EO/AA employer. Paid for by KU. _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board Afsaneh Motamed-Khorasani Ph.D. Candidate (Dr. Brown's Lab) E-mail: afsanehmotamed at yahoo.com Web: http://www.geocities.com/afsanehmotamed lab: 416-586-4800 (ext: 2451) cell: 416-454-5589 Office: room 876, Lunenfeld Research Inst., Mount Sinai Hospital 600 University Ave., Toronto, ON M5G 1X5 --------------------------------- Find your next car at Yahoo! Canada Autos -------------- next part -------------- An HTML attachment was scrubbed... URL: From mayagao1999 at yahoo.com Tue Oct 18 15:03:57 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Tue, 18 Oct 2005 12:03:57 -0700 (PDT) Subject: [BiO BB] About array CGH data based on BAC clones Message-ID: <20051018190358.56705.qmail@web53505.mail.yahoo.com> Hello everyone, Is there anybody who has the experience of analyzing array CGH data based on BAC clones to identify the BACs which are amplified or deleted(gain or loss)? Any soft tools or packages recommended? Thank you very much ahead of time! Sincerely, Alex __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From chefer at tuks.co.za Wed Oct 19 05:15:54 2005 From: chefer at tuks.co.za (Charles Hefer) Date: Wed, 19 Oct 2005 11:15:54 +0200 Subject: [BiO BB] Automated upstream region sequence retrieval Message-ID: <43560ECA.7020500@tuks.co.za> Hi I am looking for a way to automate the retrieval of upstream regions of genes (from fully sequenced genomes). I have tried the R/BioMart route, but the organisms I want are not available in BioMart (yet). The next option would be to use BLAT to retrieve the gene positions and then retrieve the upstream regions, but I am looking for a simpler solution. Does one of the Bio modules of i.e Python/Java/PERL support this functionality? The aim is to put up a little internal web-service for promoter searches, for which the desired gene ID (GenBankId) would be entered and the ~2kb upstream region returned. Thanx, in advance -- Charles From Philippe.Hupe at curie.fr Wed Oct 19 06:08:35 2005 From: Philippe.Hupe at curie.fr (=?ISO-8859-1?Q?Philippe_Hup=E9?=) Date: Wed, 19 Oct 2005 12:08:35 +0200 Subject: [BiO BB] About array CGH data based on BAC clones In-Reply-To: <20051018190358.56705.qmail@web53505.mail.yahoo.com> References: <20051018190358.56705.qmail@web53505.mail.yahoo.com> Message-ID: <43561B23.3000302@curie.fr> Alex Zhang a ?crit : >Hello everyone, > >Is there anybody who has the experience of >analyzing array CGH data based on BAC clones >to identify the BACs which are amplified or >deleted(gain or loss)? Any soft tools or >packages recommended? > >Thank you very much ahead of time! > >Sincerely, > Alex > > > > >__________________________________ >Yahoo! Mail - PC Magazine Editors' Choice 2005 >http://mail.yahoo.com >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > Dear colleague, The bioinformatics team of Institut Curie has developed several tools related to the analysis of array CGH data: - GLAD for breakpoint detection - MAIA for automatic microarray image analysis - MANOR for normalisation of microarray data - VAMP, a java graphical interface for visualisation and analysis of CGH profiles. - CAPweb, a suite of tools for the management, visualization and analysis of CGH-arrays VAMP can be requested at vamp at curie.fr , MAIA at maia at curie.fr , CAPweb at capweb at curie.fr , GLAD at glad at curie.fr and MANOR at manor at curie.fr A VAMP demo is available at http://bioinfo.curie.fr/vamp (Then click on Direct Launch and File->Import) Two movies give you an overview of VAMP software capabilities. - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo1.html - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo2.html - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo3.html You can visit our Web site at http://bioinfo.curie.fr You can try CAPweb which is a complete web platform at the following url: http://bioinfo.curie.fr/CAPweb. It allows to analyze your data directly from the gpr file and the clone info file. It includes the normalization, the breakpoints detection, the data storage and the visualization. This environment can be installed directly in our lab. Do not hesitate to ask for questions at capweb at curie.fr Best regards, Philippe hup? -- Philippe Hup? UMR 144 - Service Bioinformatique Institut Curie Laboratoire de Transfert (4?me ?tage) 26 rue d'Ulm 75005 Paris - France Email : Philippe.Hupe at curie.fr T?l : +33 (0)1 44 32 42 75 Fax : +33 (0)1 42 34 65 28 website : http://bioinfo.curie.fr From aloraine at gmail.com Wed Oct 19 11:28:08 2005 From: aloraine at gmail.com (Ann Loraine) Date: Wed, 19 Oct 2005 10:28:08 -0500 Subject: [BiO BB] Automated upstream region sequence retrieval In-Reply-To: <43560ECA.7020500@tuks.co.za> References: <43560ECA.7020500@tuks.co.za> Message-ID: <83722dde0510190828i7be36dbfoa394dcf5d256d95b@mail.gmail.com> Hi, DAS might work for you - see www.biodas.org for functioning DAS sites. To get mRNA-to-genome/gene coordinates, I would download tables from the Santa Cruz Genome Informatics site. -Ann On 10/19/05, Charles Hefer wrote: > Hi > > I am looking for a way to automate the retrieval of upstream regions > of genes (from fully sequenced genomes). > > I have tried the R/BioMart route, but the organisms I want are not > available in BioMart (yet). The next option would be to use BLAT to > retrieve the gene positions and then retrieve the upstream regions, but > I am looking for a simpler solution. Does one of the Bio modules of i.e > Python/Java/PERL support this functionality? > > The aim is to put up a little internal web-service for promoter > searches, for which the desired gene ID (GenBankId) would be entered and > the ~2kb upstream region returned. > > Thanx, in advance > -- > > Charles > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From hlapp at gmx.net Wed Oct 19 13:03:04 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 19 Oct 2005 10:03:04 -0700 Subject: [Bioperl-l] Re: [BiO BB] About array CGH data based on BAC clones In-Reply-To: <43561B23.3000302@curie.fr> References: <20051018190358.56705.qmail@web53505.mail.yahoo.com> <43561B23.3000302@curie.fr> Message-ID: <39dc54bf83b7cdf55327fa72cf1d50f3@gmx.net> Phillippe, what is the license on these software packages? Except for GLAD (which presumably is licensed as OSS compatible with Bioconductor), the website states the notorious 'available upon request,' leaving it to everybody's guess what license applies upon whose request. Is there a reason not to openly and explicitly state the license(s)? -hilmar On Oct 19, 2005, at 3:08 AM, Philippe Hup? wrote: > Alex Zhang a ?crit : > >> Hello everyone, >> >> Is there anybody who has the experience of >> analyzing array CGH data based on BAC clones >> to identify the BACs which are amplified or >> deleted(gain or loss)? Any soft tools or packages recommended? >> >> Thank you very much ahead of time! >> >> Sincerely, >> Alex >> >> >> >> >> __________________________________ Yahoo! Mail - PC Magazine Editors' >> Choice 2005 http://mail.yahoo.com >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > Dear colleague, > > > The bioinformatics team of Institut Curie has developed several tools > related to the analysis of array CGH data: > - GLAD for breakpoint detection > - MAIA for automatic microarray image analysis > - MANOR for normalisation of microarray data - VAMP, a java graphical > interface for visualisation and analysis of CGH profiles. > - CAPweb, a suite of tools for the management, visualization and > analysis of CGH-arrays > VAMP can be requested at vamp at curie.fr , MAIA at maia at curie.fr , > CAPweb at capweb at curie.fr , GLAD at glad at curie.fr and MANOR at > manor at curie.fr > > A VAMP demo is available at http://bioinfo.curie.fr/vamp (Then click > on Direct Launch and File->Import) > Two movies give you an overview of VAMP software capabilities. > - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo1.html > - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo2.html > - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo3.html > > > You can visit our Web site at http://bioinfo.curie.fr > > > You can try CAPweb which is a complete web platform at the following > url: http://bioinfo.curie.fr/CAPweb. It allows to analyze your data > directly from the gpr file and the clone info file. It includes the > normalization, the breakpoints detection, the data storage and the > visualization. This environment can be installed directly in our lab. > Do not hesitate to ask for questions at capweb at curie.fr > > > Best regards, > > > Philippe hup? > > -- > Philippe Hup? > UMR 144 - Service Bioinformatique > Institut Curie > Laboratoire de Transfert (4?me ?tage) > 26 rue d'Ulm > 75005 Paris - France > > Email : Philippe.Hupe at curie.fr > T?l : +33 (0)1 44 32 42 75 > Fax : +33 (0)1 42 34 65 28 > > website : http://bioinfo.curie.fr > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From joel.dudley at asu.edu Wed Oct 19 22:53:54 2005 From: joel.dudley at asu.edu (Joel Dudley) Date: Wed, 19 Oct 2005 19:53:54 -0700 Subject: [BiO BB] Orthologous upstream sequence analysis In-Reply-To: <726450810510091344w5a5b3683kb91842f9da5e9367@mail.gmail.com> References: <726450810510091344w5a5b3683kb91842f9da5e9367@mail.gmail.com> Message-ID: What type of analysis would you like to do? Do you want to calculate rates of evolution? Construct a phylogeny? - Joel __________________________________________________________ MacResearcher - Articles, News, and Reviews for the Mac-loving Scientist http://www.macresearcher.com On Oct 9, 2005, at 1:44 PM, Samantha Fox wrote: > > Hii... > I am looking for various options to analyze a set of 300bp long > orthologous sequences from a group of phylogenetically related > species. Its like 5-8 promoter dna sequences per gene for around > 2000 genes. > > What can be the best ways to do this analysis, I looked at some > phylogenetic tools, but most have limitations like web-based, just > 2 sequences, etc. > > All your suggestions... and anyone experienced with similar > stuff. ... I welcome it all... > > Thanks. > Samantha > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > From bioinfosm at gmail.com Thu Oct 20 14:34:47 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Thu, 20 Oct 2005 13:34:47 -0500 Subject: [BiO BB] Orthologous upstream sequence analysis In-Reply-To: References: <726450810510091344w5a5b3683kb91842f9da5e9367@mail.gmail.com> Message-ID: <726450810510201134q1987629fl916067cfaec7a952@mail.gmail.com> Well ... I am interested in the other possible analysis --- looking at equence from all these .. and find conserved motifs .. etc ... the species are quite well known phylogenetically ... but any other useful things to do with them .. would be helpful .... Thanks .. Sumit On 10/19/05, Joel Dudley wrote: > > What type of analysis would you like to do? Do you want to calculate > rates of evolution? Construct a phylogeny? > > - Joel > __________________________________________________________ > MacResearcher - Articles, News, and Reviews for the Mac-loving Scientist > http://www.macresearcher.com > > > On Oct 9, 2005, at 1:44 PM, Samantha Fox wrote: > > > > > > Hii... > > I am looking for various options to analyze a set of 300bp long > > orthologous sequences from a group of phylogenetically related > > species. Its like 5-8 promoter dna sequences per gene for around > > 2000 genes. > > > > What can be the best ways to do this analysis, I looked at some > > phylogenetic tools, but most have limitations like web-based, just > > 2 sequences, etc. > > > > All your suggestions... and anyone experienced with similar > > stuff. ... I welcome it all... > > > > Thanks. > > Samantha -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinfosm at gmail.com Fri Oct 21 11:34:39 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Fri, 21 Oct 2005 10:34:39 -0500 Subject: [BiO BB] promoter sequence analysis for TFBS .. using phylogeny Message-ID: <726450810510210834l13b8088cs545010916599f248@mail.gmail.com> Hii... I am looking for various options to analyze a set of few hundered bp long orthologous sequences from a group of phylogenetically related species. Its like 5-8 homologous promoter dna sequences per gene for around thousand genes. The motivation is to get the conserved motifs which have remained constant under selection. What can be the best ways to do this analysis, I looked at some phylogenetic tools, but most have limitations like web-based, just 2 sequences, etc. All your suggestions... and anyone experienced with similar stuff. ... I welcome it all... Thanks. Samantha -------------- next part -------------- An HTML attachment was scrubbed... URL: From gelbukh at cicling.org Wed Oct 19 05:01:56 2005 From: gelbukh at cicling.org (Alexander Gelbukh) Date: Wed, 19 Oct 2005 04:01:56 -0500 Subject: [BiO BB] CFP: CICLing-2006 -- Computational Linguistics, Springer LNCS, February, Mexico -- one week reminder Message-ID: CICLing-2006 7th International Conference on Intelligent Text Processing and Computational Linguistics February 19-25, 2006 Mexico City, Mexico Endorsed by the ACL www.CICLing.org/2006 PUBLICATION: LNCS: Springer Lecture Notes in Computer Science. SUBMISSION DEADLINE: Abstract: October 17, late submissions can be considered; Main text: October 24, 2005 (for registered abstracts). MODALITIES: Full paper: 12 pages, short paper: 4 pages. KEYNOTE SPEAKERS: Nancy Ide, Rada Mihalcea, 2 more to be announced, see website. EXCURSIONS: Ancient pyramids, Monarch butterflies, great cave and colonial city, and more. All tentative. See photos on www.CICLing.org. AWARDS: Best paper, best presentation, best poster, best demo. +------------------------------------------------------- | Topics +------------------------------------------------------- Computational linguistics research: Comp. Linguistics theories and formalisms, Knowledge representation, Comp. morphology, syntax, semantics, Discourse models, Machine translation, text generation, Statistical methods, corpus linguistics, Lexical resources; Intelligent text processing and applications: Information retrieval, question answering, Information extraction, Text mining, Document categorization and clustering, Automatic summarization, Natural language interfaces, Spell-checking; and all related topics. +------------------------------------------------------- | Schedule (tentative) +------------------------------------------------------- Sunday, Wednesday, Saturday: full-day excursions; Monday, Tuesday, Thursday, Friday: talks; Monday: Welcome party & poster session. See website. ==================================================== See complete CFP and contact on www.CICLing.org/2006 ==================================================== We send you this CFP in good faith of its usefulness for you. If you do not want to receive any new messages, please let us know replying to this message. We deeply apologize for any inconvenience. From hlapp at gmx.net Mon Oct 24 02:55:37 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Oct 2005 23:55:37 -0700 Subject: [Bioperl-l] Re: [BiO BB] About array CGH data based on BAC clones In-Reply-To: <435C12BE.1080308@curie.fr> References: <20051018190358.56705.qmail@web53505.mail.yahoo.com> <43561B23.3000302@curie.fr> <39dc54bf83b7cdf55327fa72cf1d50f3@gmx.net> <435C12BE.1080308@curie.fr> Message-ID: <80653bd36bc1a06cda34548879240d7b@gmx.net> Dear Emmanuel, I'm interested in why you make the license of these software packages a secret instead of clearly and openly stating it (or them). Or is the exact license negotiable? (If I were to consider my options for a certain kind of software I shouldn't have to email 20 people to inquire about the licenses, don't you think?) -hilmar On Oct 23, 2005, at 3:46 PM, Emmanuel Barillot wrote: > Hilmar Lapp wrote: >> Phillippe, >> what is the license on these software packages? Except for GLAD >> (which presumably is licensed as OSS compatible with Bioconductor), >> the website states the notorious 'available upon request,' leaving it >> to everybody's guess what license applies upon whose request. Is >> there a reason not to openly and explicitly state the license(s)? > > Dear Sir, > > MANOR and GLAD are indeed OSS. > The others have various modes of licencing. For example VAMP is free > for academics under some conditions. > > What are you interested in? > > Best regards > Emmanuel Barillot > Director of Bioinformatics > >> -hilmar >> On Oct 19, 2005, at 3:08 AM, Philippe Hup? wrote: >>> Alex Zhang a ?crit : >>> >>>> Hello everyone, >>>> >>>> Is there anybody who has the experience of >>>> analyzing array CGH data based on BAC clones >>>> to identify the BACs which are amplified or >>>> deleted(gain or loss)? Any soft tools or packages recommended? >>>> >>>> Thank you very much ahead of time! >>>> >>>> Sincerely, >>>> Alex >>>> >>>> >>>> __________________________________ Yahoo! Mail - PC >>>> Magazine Editors' Choice 2005 http://mail.yahoo.com >>>> _______________________________________________ >>>> Bioinformatics.Org general forum - >>>> BiO_Bulletin_Board at bioinformatics.org >>>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>>> >>>> >>>> >>> Dear colleague, >>> >>> >>> The bioinformatics team of Institut Curie has developed several >>> tools related to the analysis of array CGH data: >>> - GLAD for breakpoint detection >>> - MAIA for automatic microarray image analysis >>> - MANOR for normalisation of microarray data - VAMP, a java >>> graphical interface for visualisation and analysis of CGH profiles. >>> - CAPweb, a suite of tools for the management, visualization and >>> analysis of CGH-arrays >>> VAMP can be requested at vamp at curie.fr , MAIA at maia at curie.fr , >>> CAPweb at capweb at curie.fr , GLAD at glad at curie.fr and MANOR at >>> manor at curie.fr >>> >>> A VAMP demo is available at http://bioinfo.curie.fr/vamp (Then click >>> on Direct Launch and File->Import) >>> Two movies give you an overview of VAMP software capabilities. >>> - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo1.html >>> - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo2.html >>> - http://bioinfo-out.curie.fr/tutorial/vamp/vamp-demo3.html >>> >>> >>> You can visit our Web site at http://bioinfo.curie.fr >>> >>> >>> You can try CAPweb which is a complete web platform at the >>> following url: http://bioinfo.curie.fr/CAPweb. It allows to analyze >>> your data directly from the gpr file and the clone info file. It >>> includes the normalization, the breakpoints detection, the data >>> storage and the visualization. This environment can be installed >>> directly in our lab. Do not hesitate to ask for questions at >>> capweb at curie.fr >>> >>> >>> Best regards, >>> >>> >>> Philippe hup? >>> >>> -- >>> Philippe Hup? >>> UMR 144 - Service Bioinformatique >>> Institut Curie >>> Laboratoire de Transfert (4?me ?tage) >>> 26 rue d'Ulm >>> 75005 Paris - France >>> Email : Philippe.Hupe at curie.fr >>> T?l : +33 (0)1 44 32 42 75 >>> Fax : +33 (0)1 42 34 65 28 >>> >>> website : http://bioinfo.curie.fr >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hiekeen at gmail.com Mon Oct 24 11:32:45 2005 From: hiekeen at gmail.com (ekeen ekeen) Date: Mon, 24 Oct 2005 23:32:45 +0800 Subject: [BiO BB] A simple question about the kinase groups. In-Reply-To: <01e201c5d876$8506e7d0$8e00a8c0@msl142> References: <01e201c5d876$8506e7d0$8e00a8c0@msl142> Message-ID: Hello everyone, I am luckily to download the The Phospho.ELM database. The data set provide the kinase. Now I want get the sequences which phosphorylated by CAMK Groups kinase. The CAMK Group was classified by Manning. (Manning et al., 2002). In the CAMK Group, there are fellow kinases: CAMK1,CAMK2,CAMKL,CAMK-Unique,CASK,DAPK,DCAMKL,MAPKAPK,MLCK,PHK,PIM,PKD,PSK,RAD53,RSKb,Trbl,Trio,TSSK. But I can only find very few sequences from the Phospho.ELM database. I think this just because the name of kinase in Phospho.ELM database is different from the name classified by Manning. Can you give me some suggestion about this to me? I hope I have expressed my question clearly. Thanks very much. Ekeen -------------- next part -------------- An HTML attachment was scrubbed... URL: From lauran at walla2.com Mon Oct 24 23:38:43 2005 From: lauran at walla2.com (Laura Nielson) Date: Mon, 24 Oct 2005 20:38:43 -0700 Subject: [BiO BB] (no subject) Message-ID: <000a01c5d915$9ca66ba0$d702f304@yourxhtr8hvc4p> Why is it not possible to spool out precipitated proteins in comparison to spooling out precipitated DNA? DNA is very, very long. How long are proteins (polypeptides)? I've done the experiment where you precipitate DNA out of onion cells. Could you precipitate RNA out of onion cells? How? or Why not? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmb at mrc-dunn.cam.ac.uk Tue Oct 25 03:53:58 2005 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue, 25 Oct 2005 08:53:58 +0100 Subject: [BiO BB] (no subject) In-Reply-To: <000a01c5d915$9ca66ba0$d702f304@yourxhtr8hvc4p> References: <000a01c5d915$9ca66ba0$d702f304@yourxhtr8hvc4p> Message-ID: <435DE496.3090904@mrc-dunn.cam.ac.uk> Laura Nielson wrote: > Why is it not possible to spool out precipitated proteins in comparison > to spooling out precipitated DNA? DNA is very, very long. How long are > proteins (polypeptides)? > > I've done the experiment where you precipitate DNA out of onion cells. > Could you precipitate RNA out of onion cells? How? or Why not? One reason is the ubiquity of RNAses. Anyone working with RNA will tell you what a pain it is, as any RNA 'out there' will be sliced up faster than you an do anything else. I have no idea of average RNA lengths, but typically *much* shorter than DNA. They will be (approximately) three times the length of proteins, and the longest protein is ~12,000 amino acids AFAIR. Proteins do precipitate all the time. The nack is to precipitate exactly the protein you want, and none of the proteins you don't want. > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From boris.steipe at utoronto.ca Tue Oct 25 08:57:09 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Tue, 25 Oct 2005 08:57:09 -0400 Subject: [BiO BB] A simple question about the kinase groups. In-Reply-To: References: <01e201c5d876$8506e7d0$8e00a8c0@msl142> Message-ID: <1268CF7F-0B13-4E6D-9B5E-D84B6BBC53C6@utoronto.ca> If the names are different, maybe a dictionary of synonyms will help you. For example the Austrain BioMinT database stores such information for model organisms. A query for CAMK1 returns: CAMK1 Homo sapiens calcium/calmodulin-dependent protein kinase I Preferred gene HUGO:1459, LocusLink:8536 CAMK1 Preferred gene HUGO:1459, LocusLink:8536, SwissProt:Q14012 CALCIUM/CALMODULIN-DEPENDENT PROTEIN KINASE I Gene OMIM:604998, GDB:642249 CAMK1 Gene OMIM:604998, GDB:642249 CAMK1-PEN Gene GDB:642249 CaMKI Gene HUGO:1459, OMIM:604998, GDB:642249, LocusLink:8536 calcium/calmodulin-dependent protein kinase I Protein LocusLink:8536 Calcium/calmodulin-dependent protein kinase type I Protein SwissProt:Q14012 CaM kinase I Protein SwissProt:Q14012 EC 2.7.1.123 Protein SwissProt:Q14012 Hope this helps (though I'm not familiar with the Phospho.ELM database), Boris On 24 Oct 2005, at 11:32, ekeen ekeen wrote: > Hello everyone, > > I am luckily to download the The Phospho.ELM database. The data set > provide the kinase. Now I want get the sequences which > phosphorylated by CAMK Groups kinase. The CAMK Group was classified > by Manning. (Manning et al., 2002). In the CAMK Group, there are > fellow kinases: CAMK1,CAMK2,CAMKL,CAMK- > Unique,CASK,DAPK,DCAMKL,MAPKAPK,MLCK,PHK,PIM,PKD,PSK,RAD53,RSKb,Trbl,T > rio,TSSK. But I can only find very few sequences from the > Phospho.ELM database. I think this just because the name of kinase > in Phospho.ELM database is different from the name classified by > Manning. Can you give me some suggestion about this to me? I hope I > have expressed my question clearly. > > Thanks very much. > > Ekeen > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From christoph.gille at charite.de Tue Oct 25 14:26:22 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 25 Oct 2005 20:26:22 +0200 (CEST) Subject: [BiO BB] X-compilation of C/C++ programs for Macintosh Message-ID: <61623.84.190.44.109.1130264782.squirrel@webmail.charite.de> Unfortunately, macintosh computers often do not have a C/C++ compiler. The Java program for protein sequence and structure analysis that I have developed uses many modules written in C or C++. Usually, program parts programmed in C/C++ are downloaded as source and the Java program invokes the compiler to produce an executable suitable for the OS. To cope with computer systems that do not yet support C++ I just developed a simple mechanism to obtain compiled versions that I can integrate in the program package: On those Macintosh computers where a compiler exists the program operates normally. After compilation the user can upload the executable to my server. Then I can add the binary installation to the package. After this the program part can be installed as a binary for all users and does not need compilation any more. Does this make sense ? It is not tested yet. Do you have a Mac OSX with a C-compiler and would like to test it? Christoph From davidow at molbio.mgh.harvard.edu Tue Oct 25 14:45:35 2005 From: davidow at molbio.mgh.harvard.edu (Lance Davidow) Date: Tue, 25 Oct 2005 14:45:35 -0400 Subject: [BiO BB] X-compilation of C/C++ programs for Macintosh In-Reply-To: <61623.84.190.44.109.1130264782.squirrel@webmail.charite.de> References: <61623.84.190.44.109.1130264782.squirrel@webmail.charite.de> Message-ID: christoph The Mac OS X compilers are optional installs on the xcode tools CD that ships with the Mac OS X 10.3 CDs or is on the Mac OS X 10.4 DVD. Lance >Unfortunately, macintosh computers often do not have a C/C++ compiler. >The Java program for protein sequence and structure analysis that I >have developed uses many modules written in C or C++. > >Usually, program parts programmed in C/C++ are downloaded as source >and the Java program invokes the compiler to produce an executable >suitable for the OS. > >To cope with computer systems that do not yet support C++ I just >developed a simple mechanism to obtain compiled versions that I can >integrate in the program package: > >On those Macintosh computers where a compiler exists the program >operates normally. > >After compilation the user can upload the executable to my server. >Then I can add the binary installation to the package. > >After this the program part can be installed as a binary for all users >and does not need compilation any more. > >Does this make sense ? >It is not tested yet. >Do you have a Mac OSX with a C-compiler and would like to test it? > >Christoph > >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Lance Davidow, PhD Director of Bioinformatics Dept of Molecular Biology Mass General Hospital Boston MA 02114 davidow at molbio.mgh.harvard.edu 617.726-5955 Fax: 617.726-6893 From dbeach at email.unc.edu Tue Oct 25 22:56:45 2005 From: dbeach at email.unc.edu (Dale Beach) Date: Tue, 25 Oct 2005 22:56:45 -0400 Subject: [BiO BB] Spooling DNA In-Reply-To: <000a01c5d915$9ca66ba0$d702f304@yourxhtr8hvc4p> References: <000a01c5d915$9ca66ba0$d702f304@yourxhtr8hvc4p> Message-ID: <435EF06D.4040404@email.unc.edu> You can spool the DNA because of its length. RNA and proteins are both easily precipitated but need a centrifuge to rapidly collect the precipitate. As for the DNA precipitation from an onion, assuming that you are using an alcohol (isopropanol or ethanol are common) you should also be precipitating the RNA since the chemical properties of DNA and RNA are similar (they are both nucleotide chains!). The RNA just doesn't spool like the DNA. And even though there are plenty of RNAses (along with proteases and DNAses), the RNA will still precipitate with the DNA though it might be a little fragmented! The real question is how many folks reading this have actually done any wet-lab molecular biology? I come from a MolBio background, and wonder how many bioinformaticians have gotten their hands dirty in a MolBio lab? More importantly how many would LIKE TO? If there were a course available, say 1 week where you cloned and sequenced a gene then expressed a protein, would you take it? What else would you want to do? The goal would be to provide some practical experience with the molecules that so many folks are busy modeling. dale Dale Beach, PhD SPIRE Postdoctoral Fellow, UNC-CH Duke University Medical Center Jones CB3020 Durham, NC 27710 Laura Nielson wrote: > Why is it not possible to spool out precipitated proteins in > comparison to spooling out precipitated DNA? DNA is very, very long. > How long are proteins (polypeptides)? > > I've done the experiment where you precipitate DNA out of onion > cells. Could you precipitate RNA out of onion cells? How? or Why not? > > > >------------------------------------------------------------------------ > >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdwan at ccgb.umn.edu Tue Oct 25 14:34:22 2005 From: cdwan at ccgb.umn.edu (Christopher Dwan) Date: Tue, 25 Oct 2005 14:34:22 -0400 Subject: [BiO BB] X-compilation of C/C++ programs for Macintosh In-Reply-To: <61623.84.190.44.109.1130264782.squirrel@webmail.charite.de> References: <61623.84.190.44.109.1130264782.squirrel@webmail.charite.de> Message-ID: <89113D62-0AD9-4306-84CB-2DDF88075F61@ccgb.umn.edu> Christoph, Your information on Apple machines is a bit dated. All variants of OS X (the operating system shipped with all Apple computers for the last few years) support the apple developer tools (http://developer.apple.com/tools/xcode/index.html), which provide an integrated development environment, plus support for standard Unix compilation (gcc, make, and the like). It's available for free download at the URL above. You can also directly install the GNU compiler tools using FINK (http://fink.sourceforge.net/). Apples support C and C++ just fine. -Chris Dwan On Oct 25, 2005, at 2:26 PM, Dr. Christoph Gille wrote: > Unfortunately, macintosh computers often do not have a C/C++ compiler. > The Java program for protein sequence and structure analysis that I > have developed uses many modules written in C or C++. > > Usually, program parts programmed in C/C++ are downloaded as source > and the Java program invokes the compiler to produce an executable > suitable for the OS. > > To cope with computer systems that do not yet support C++ I just > developed a simple mechanism to obtain compiled versions that I can > integrate in the program package: > > On those Macintosh computers where a compiler exists the program > operates normally. > > After compilation the user can upload the executable to my server. > Then I can add the binary installation to the package. > > After this the program part can be installed as a binary for all users > and does not need compilation any more. > > Does this make sense ? > It is not tested yet. > Do you have a Mac OSX with a C-compiler and would like to test it? > > Christoph > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From marty.gollery at gmail.com Tue Oct 25 23:36:36 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Tue, 25 Oct 2005 20:36:36 -0700 Subject: [BiO BB] Spooling DNA In-Reply-To: <435EF06D.4040404@email.unc.edu> References: <000a01c5d915$9ca66ba0$d702f304@yourxhtr8hvc4p> <435EF06D.4040404@email.unc.edu> Message-ID: Would I want to take a MolBio 1 week course? Definitely. Would I ever likely have the time to do so? Almost certainly not. The second problem with putting something like that on is that bioinformaticians are widely varied in background and daily tasks, so coming up with something that is relevant to each would not be easy. Cheers, Marty On 10/25/05, Dale Beach wrote: > > You can spool the DNA because of its length. RNA and proteins are both > easily precipitated but need a centrifuge to rapidly collect the > precipitate. As for the DNA precipitation from an onion, assuming that you > are using an alcohol (isopropanol or ethanol are common) you should also be > precipitating the RNA since the chemical properties of DNA and RNA are > similar (they are both nucleotide chains!). The RNA just doesn't spool like > the DNA. And even though there are plenty of RNAses (along with proteases > and DNAses), the RNA will still precipitate with the DNA though it might be > a little fragmented! > > The real question is how many folks reading this have actually done any > wet-lab molecular biology? I come from a MolBio background, and wonder how > many bioinformaticians have gotten their hands dirty in a MolBio lab? More > importantly how many would LIKE TO? If there were a course available, say 1 > week where you cloned and sequenced a gene then expressed a protein, would > you take it? What else would you want to do? The goal would be to provide > some practical experience with the molecules that so many folks are busy > modeling. > > dale > > Dale Beach, PhD > SPIRE Postdoctoral Fellow, UNC-CH > Duke University Medical Center > Jones CB3020 > Durham, NC 27710 > > > > Laura Nielson wrote: > > Why is it not possible to spool out precipitated proteins in comparison to > spooling out precipitated DNA? DNA is very, very long. How long are proteins > (polypeptides)? > I've done the experiment where you precipitate DNA out of onion cells. > Could you precipitate RNA out of onion cells? How? or Why not? > > ------------------------------ > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From info at www.lmcs-online.org Tue Oct 25 08:49:24 2005 From: info at www.lmcs-online.org (Logical Methods in CS) Date: Tue, 25 Oct 2005 14:49:24 +0200 Subject: [BiO BB] Journal "Logical Methods in CS" Message-ID: <200510251249.j9PCnOE07861@www.lmcs-online.org> ---------------------------------------------------------------------- year 1 of a new journal year 1 of a new journal year one of a new journal -------------------------EXCUSE MULTIPLE COPIES----------------------- Dear Colleague: We are writing to inform you about the progress of the open-access, online journal "Logical Methods in Computer Science," which has recently benefited from a freshly designed web site, see: http://www.lmcs-online.org In the first year of its existence, the journal received 75 submissions: 21 were accepted and 22 declined (the rest are still in the editorial process). The first issue is complete, and we anticipate that will be three in all by the end of the calendar year. Our eventual aim is to publish four issues per year. We also publish Special Issues: to date, three are in progress, devoted to selected papers from LICS 2004, CAV 2005 and LICS 2005. The average turn-around from submission to publication has been 7 months. This comprises a thorough refereeing and revision process: every submission is refereed in the normal way by two or more referees, who apply high standards of quality. We would encourage you to submit your best papers to Logical Methods in Computer Science, and to encourage your colleagues to do so too. There is a flier and a leaflet containing basic information about the new journal on the homepage; we would appreciate your posting and distributing them, or otherwise publicising the journal. We would also appreciate any suggestions you may have on how we may improve the journal. Yours Sincerely, Dana S. Scott (editor-in-chief) Gordon D. Plotkin and Moshe Y. Vardi (managing editors) Jiri Adamek (executive editor) From devprozad at gmail.com Wed Oct 26 03:43:56 2005 From: devprozad at gmail.com (Debaprasad Mukherjee) Date: Wed, 26 Oct 2005 13:13:56 +0530 Subject: [BiO BB] Digital Signal Processing on Genome sequences Message-ID: <2c66cc560510260043u6f665ebflc648352c1a2344c7@mail.gmail.com> Dear Friends and Colleagues, I am a graduate in electrical engineering and am working in bioinformatics software development and evolutionary genomics. I have a strong interest in the application of Digital Signal Processing to genome sequences. I would like to request you all to kindly give me some pointers on the types of work that is being done and some recent references. It would be very helpful if I can have some introductory discussions with somebody who is working in this field. Is there any thesis available on the net? I have been searching on this for quite sometime now and have read up a few basic papers. I am looking to increase my collection of reading material on this topic. Thank you very much. Eagerly awaiting your reply. Regards, Debaprasad Mukherjee From dmb at mrc-dunn.cam.ac.uk Wed Oct 26 04:07:53 2005 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Wed, 26 Oct 2005 09:07:53 +0100 Subject: [BiO BB] Digital Signal Processing on Genome sequences In-Reply-To: <2c66cc560510260043u6f665ebflc648352c1a2344c7@mail.gmail.com> References: <2c66cc560510260043u6f665ebflc648352c1a2344c7@mail.gmail.com> Message-ID: <435F3959.3080401@mrc-dunn.cam.ac.uk> Debaprasad Mukherjee wrote: > Dear Friends and Colleagues, > I am a graduate in electrical engineering and am working in > bioinformatics software development and evolutionary genomics. I have > a strong interest in the application of Digital Signal Processing to > genome sequences. > > I would like to request you all to kindly give me some pointers on the > types of work that is being done and some recent references. > > It would be very helpful if I can have some introductory discussions > with somebody who is working in this field. Is there any thesis > available on the net? I have been searching on this for quite sometime > now and have read up a few basic papers. > > I am looking to increase my collection of reading material on this topic. This query on pubmed; http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pubmed&term=bioinformatics+wavelets Turns up 6 references (and two reviews). I am certainly no expert, but I think the idea of wavelets comes from signal processing. > Thank you very much. Eagerly awaiting your reply. All the best, Dan. > Regards, > Debaprasad Mukherjee > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From idoerg at gmail.com Wed Oct 26 12:48:46 2005 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 26 Oct 2005 09:48:46 -0700 Subject: [BiO BB] Digital Signal Processing on Genome sequences In-Reply-To: <435F3959.3080401@mrc-dunn.cam.ac.uk> References: <2c66cc560510260043u6f665ebflc648352c1a2344c7@mail.gmail.com> <435F3959.3080401@mrc-dunn.cam.ac.uk> Message-ID: <435FB36E.5000407@burnham.org> Dan Bolser wrote: > Debaprasad Mukherjee wrote: > >> Dear Friends and Colleagues, >> I am a graduate in electrical engineering and am working in >> bioinformatics software development and evolutionary genomics. I have >> a strong interest in the application of Digital Signal Processing to >> genome sequences. >> >> I would like to request you all to kindly give me some pointers on the >> types of work that is being done and some recent references. >> >> It would be very helpful if I can have some introductory discussions >> with somebody who is working in this field. Is there any thesis >> available on the net? I have been searching on this for quite sometime >> now and have read up a few basic papers. >> >> I am looking to increase my collection of reading material on this >> topic. > > > This query on pubmed; > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pubmed&term=bioinformatics+wavelets > > > Turns up 6 references (and two reviews). I am certainly no expert, but > I think the idea of wavelets comes from signal processing. > Wavelets indeed do. But so does anything that has to do with Information Theory-- esp. the use of Shannon entropy "information content DNA sequence" turns up 600 papers Signal to noise ratio in sequences.... quite a bit too. > >> Thank you very much. Eagerly awaiting your reply. > > > All the best, > Dan. > >> Regards, >> Debaprasad Mukherjee >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://ffas.ljcrf.edu/~iddo From arthipesa at yahoo.com Thu Oct 27 10:24:25 2005 From: arthipesa at yahoo.com (Arti Hikmatullah Perbawana Sakti Buana) Date: Thu, 27 Oct 2005 14:24:25 +0000 Subject: [BiO BB] Birthday Book Message-ID: <20051027142504.B836C3682E5@primary.bioinformatics.org> Hi I am building a birthday book for myself and would appreciate some quick help from you. Just click on the link below and enter your birthday details. It's easy and you can keep your age secret!... http://www.birthdayalarm.com/bd2/56242253a703741135b807547278c214934074d905 Thanks Arti Hikmatullah Perbana Sakti Buana From dbeach at email.unc.edu Thu Oct 27 13:52:36 2005 From: dbeach at email.unc.edu (Dale Beach) Date: Thu, 27 Oct 2005 13:52:36 -0400 Subject: [BiO BB] Digital Signal Processing on Genome sequences In-Reply-To: <2c66cc560510260043u6f665ebflc648352c1a2344c7@mail.gmail.com> References: <2c66cc560510260043u6f665ebflc648352c1a2344c7@mail.gmail.com> Message-ID: <436113E4.9030602@email.unc.edu> you might look at work by HJ Bussemaker db Debaprasad Mukherjee wrote: >Dear Friends and Colleagues, >I am a graduate in electrical engineering and am working in >bioinformatics software development and evolutionary genomics. I have >a strong interest in the application of Digital Signal Processing to >genome sequences. > >I would like to request you all to kindly give me some pointers on the >types of work that is being done and some recent references. > >It would be very helpful if I can have some introductory discussions >with somebody who is working in this field. Is there any thesis >available on the net? I have been searching on this for quite sometime >now and have read up a few basic papers. > >I am looking to increase my collection of reading material on this topic. > >Thank you very much. Eagerly awaiting your reply. > >Regards, >Debaprasad Mukherjee >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > From boris.steipe at utoronto.ca Thu Oct 27 14:32:50 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 27 Oct 2005 14:32:50 -0400 Subject: [BiO BB] Fwd: Please take the Gene Ontology survey References: Message-ID: My apologies in case this reaches anyone more than once. Context: the GO grant is up for competitive renewal in the new year, and the volume of responses to this survey will help GO demonstrate the degree to which it has been adopted by the community. And, as you know, government support for computational biology infrastructure has been insufficient in the recent past. Boris Begin forwarded message: > From: Jane Lomax > Date: 27 October 2005 13:20:14 GMT-04:00 > To: boris.steipe at utoronto.ca > Subject: Please take the Gene Ontology survey > > > Hello, > > The Gene Ontology (GO) is a system for functional annotation of > genes and > gene products. It enables classification of gene products according to > molecular function, biological process, and cellular location of > action. > > Please help us by taking part in our survey. > > The results of this survey will help us improve our services > to our user community, and help direct our resources more effectively. > > It's a very straightforward set of questions, which should take a > maximum > of 10 minutes to complete. There's no requirement to submit your name > or email address. To complete the survey, go to: > > http://www.AdvancedSurvey.com/default.asp?SurveyID=32355 > > Please pass on to any friends or collegues not on these lists. > > Many thanks for your time, > > The GO Consortium > > > > > > > > > > > From christoph.gille at charite.de Fri Oct 28 06:17:40 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 28 Oct 2005 12:17:40 +0200 (CEST) Subject: [BiO BB] which blast server Message-ID: <1109.141.20.65.223.1130494660.squirrel@webmail.charite.de> I want my computer program to invoke a BLAST search in Swissprot+TREMBL on a remote blast server and to fetch the blast result. Due to fire-wall restrictions I can only use the http port. I would prefere XML formated output Which server would you recommend? Many thanks From idoerg at burnham.org Fri Oct 28 12:09:00 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Fri, 28 Oct 2005 09:09:00 -0700 Subject: [BiO BB] which blast server In-Reply-To: <1109.141.20.65.223.1130494660.squirrel@webmail.charite.de> References: <1109.141.20.65.223.1130494660.squirrel@webmail.charite.de> Message-ID: <43624D1C.5090600@burnham.org> Dr. Christoph Gille wrote: >I want my computer program to invoke a BLAST search in >Swissprot+TREMBL on a remote blast server >and to fetch the blast result. > >Due to fire-wall restrictions I can only use the http port. > >I would prefere XML formated output > >Which server would you recommend? >Many thanks > > >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > NCBI BLAST seems to fulfill all your requirements: port 80 and an XML output -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9949 http://ffas.ljcrf.edu/~iddo From bioinfosm at gmail.com Fri Oct 28 17:08:10 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Fri, 28 Oct 2005 16:08:10 -0500 Subject: [BiO BB] promoter sequence analysis for TFBS .. using phylogeny In-Reply-To: <435A1FC0.3050908@nuim.ie> References: <726450810510210834l13b8088cs545010916599f248@mail.gmail.com> <43590D20.5090909@nuim.ie> <726450810510211438v2fc94390na1ee394e3da26721@mail.gmail.com> <435A1FC0.3050908@nuim.ie> Message-ID: <726450810510281408i42eacf9dl856bc0e8cf8df696@mail.gmail.com> Hi Thomas, Sorry for the delay in responding, but thanks much for the reply. Well I have 2000 sets of orthologous sequences for these 8 species. And their phylogenetic distance is quite well known. So actually it would be more useful to use the phylogenetic information, and then perform alignments to pick out Transcription factor binding sites, or be more confident of finding TFBS, etc. I hope to have explained it better .. would you have any suggestions in this regard ! Cheers, Samantha On 10/22/05, Thomas Keane wrote: > > I'd advise downloading your own copy of phyml and running it on your > computer - there is a link to download a copy somewhere on the website. > You should note that phyml only recognises phylip format files. Its > basically software for building phylogenetic trees from a group of > alignments. > > I havent used MEME - what we do in our lab is run Clustalw to align the > sequences then GBlocks to remove any badly aligned areas > (http://molevol.ibmb.csic.es/Gblocks/Gblocks.html) then modelgenerator > to get the ML model then phyml to get our bootstrapped trees :-) > > Thomas > > Samantha Fox wrote: > > > Dear Thomas, > > Thanks for the response. I could use the ModelGenerator, but PHYML > > online execution gave some error. I will also have to read their > > paper, to see what exactly the software does .... and what I should > > expect as output. > > > > One simple thing I did was run MEME on each ortholog set, and the > > blocks come out nicely conserved ... sequentially as well. However I > > was not sure of what next, and didnt go further. > > Another task was to run clustalw on each ortholog set separately and > > see what sequences come out conserved in majority sets, etc.. Most > > results again were repitition of known .. so not really interesting ... > > > > Any comments on this ....... or some other ideas ??? > > > > Cheers ... > > Samantha > > > > On 10/21/05, *Thomas Keane* > > wrote: > > > > If you want to use maximum likelihood then I would suggest that > > you use > > Phyml (http://atgc.lirmm.fr/phyml/) - you can download your own copy. > > You will also need to find the optimal ML model first - you could use > > Modelgenerator (http://bioinf.nuim.ie/software/modelgenerator) to do > > this as it creates scripts to start Phyml with the optimal model. > > > > Thomas > > > > Samantha Fox wrote: > > > > > Hii... > > > I am looking for various options to analyze a set of few hundered bp > > > long orthologous sequences from a group of phylogenetically related > > > species. Its like 5-8 homologous promoter dna sequences per gene > > for > > > around thousand genes. The motivation is to get the conserved motifs > > > which have remained constant under selection. > > > > > > What can be the best ways to do this analysis, I looked at some > > > phylogenetic tools, but most have limitations like web-based, > > just 2 > > > sequences, etc. > > > > > > All your suggestions... and anyone experienced with similar > > stuff. ... > > > I welcome it all... > > > > > > Thanks. > > > Samantha > > > > > > >------------------------------------------------------------------------ > > > > > > > >_______________________________________________ > > >Bioinformatics.Org general > > forum - BiO_Bulletin_Board at bioinformatics.org > > > > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinfosm at gmail.com Fri Oct 28 17:22:43 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Fri, 28 Oct 2005 16:22:43 -0500 Subject: Fwd: [BiO BB] No. of mismatches in dna sequence alignment In-Reply-To: <20050827140128.99101.qmail@web30310.mail.mud.yahoo.com> References: <726450810508261245296b83ba@mail.gmail.com> <20050827140128.99101.qmail@web30310.mail.mud.yahoo.com> Message-ID: <726450810510281422m3dc79a49o3d1e5e3e04d086ae@mail.gmail.com> Thanks to all for their replies. I just wanted to share, that I finally got to use align0 from the fasta package .. for my purpose of finding distance (or similarity) between small dna sequences. align0 does not penalize end gaps. Samantha On 8/27/05, Eric L. Cabot wrote: > > Samantha, > > I'm assuming that you aren't going to use mimatches determined this way > for > a phylogenetic analysis! If so, there are "better" ways to go about geting > distance measures. > > That being said, if you are willing to look at your sequences to determine > where the endgaps are, and you are buying into using EMBOSS, then you > could > first filter your sequences with the EMBOSS program SeqRet, supplying the > start and stop positions of interest. > > In your case, you the region of interest is spans positions 2 through 6. > Here's a set of sample command lines: > > c:> seqret needle.msf -sbegin=2 -send=6 -osformat=msf -outseq=trunc.msf > > c:\Stuff>type trunc.msf > !!NA_MULTIPLE_ALIGNMENT 1.0 > > trunc.msf MSF: 5 Type: N 27/08/05 CompCheck: 2332 .. > > Name: one Len: 5 Check: 1166 Weight: 1.00 > Name: two Len: 5 Check: 1166 Weight: 1.00 > > // > > 1 5 > one cagtt > two cagtt > > > c:> c:\Stuff>infoalign trunc.msf -weight=n -change=n -description=n -auto > Warning: Sequence character string not found in ajSeqCvtKS > # USA Name SeqLen AlignLen Gaps GapLen > Ident Similar Differ > msf::trunc.msf:one one 5 5 0 0 5 0 > 0 > msf::trunc.msf:two two 5 5 0 0 5 0 > 0 > > > > Of course, years back, when I was working in Technical Support at GCG, I > would have provided a GCG-centric solution, involving the program > Reformat. > > > If you don't want to read the sequences, then I could probably whip-up a > Perl script for a specific format of sequences to detect and/or remove > endgaps. But hopefully, SeqRet/InfoAlign will do. > > Eric L. Cabot > Genome Center > University of Wisconsin > > --- Samantha Fox wrote: > > > Thanks for your response. Heres some conversation and discussion we > > had, but still looking for a solution. > > > > ---------- Forwarded message ---------- > > From: Samantha Fox > > Date: Aug 26, 2005 2:34 PM > > Subject: Re: [BiO BB] No. of mismatches in dna sequence alignment > > To: pfern at igc.gulbenkian.pt, "The general forum at Bioinformatics.Org" > > > > > > > > :) Thats what, it gives 6-5 = 1. But for tcagtt and cagttt pair I want > > a value 0, as their alignment gives no mismatch, just end gaps. > > > > tcagtt- > > -cagttt > > This alignment also gives one difference. This is not the same as > > 0-mismatch that I expect ! > > > > Is there something that gives edit distance between dna sequences ? > > > > # USA Name SeqLen AlignLen Gaps GapLen > > Ident Similar Differ % Change Weight Description > > msf::wf.needle:tcagtt tcagtt 6 6 0 0 5 > > 0 1 16.666666 1.000000 > > msf::wf.needle:cagttt cagttt 6 6 0 0 5 > > 0 1 16.666666 1.000000 > > > > Hope I clarified what I desire. > > > > Basically the motivation is, I wish to use pair-wise distances to make > > groups of these small dna sequences. So cagttt should be in the same > > group as tcagtt, as its just a sort of extension. > > > > Thanks. > > > > On 8/26/05, Pedro Fernandes wrote: > > > Dear Samantha > > > > > > If you subtract: Ident from AlignLen you get your result. Am I > > mistaken? > > Samantha > > > > from your last example > > > > ================== > > > > >one > > tcagtt > > >two > > gcagtt > > > > ================== > > > > Run EMBOSS NEEDLE and get > > > > > > ================== > > > > > > !!NA_MULTIPLE_ALIGNMENT 1.0 > > > > outfile MSF: 6 Type: N 27/08/05 CompCheck: 3229 .. > > > > Name: one Len: 6 Check: 1621 Weight: 1.00 > > Name: two Len: 6 Check: 1608 Weight: 1.00 > > > > // > > > > 1 6 > > one tcagtt > > two gcagtt > > > > ================== > > > > Then run EMBOSS INFOALIGN with this output and get > > > > ================== > > > > > > Name AlignLen Ident Differ > > one 6 5 1 > > two 6 5 1 > > > > > > ================== > > > > Use the Differ column dirctly or else just subtract:AlignLen-Ident > > > > Is this what you need? > > > > > > > > Hope this helps > > > Pedro > > > > > > > > > Samantha Fox said: > > > > Pedro, thanks for taking time to run for my example. What field do > > you > > > > look at for the results ? > > > > > > > > > > > > On 8/26/05, Pedro Fernandes wrote: > > > >> Hi > > > >> > > > >> I tried INFOALIGN on your ALIGNED sequences and it does work! > > > >> Maybe you are using the initial sequences not the aligned ones. > > > >> > > > >> > > > >> Pedro > > > >> > > > >> > > > > > > > > > _______________________________________________ > > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > > > > ____________________________________________________ > Start your day with Yahoo! - make it your home page > http://www.yahoo.com/r/hs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: