[BiO BB] Gene prediction

Rob C insight.robin at gmail.com
Thu Dec 7 21:48:19 EST 2006

I am interested in this problem.  We are developing a tool to facilitate
and cross-database data analysis, and hope the bioinformatics community can
some benefit from it.  However, I am not an expert in bioinformatics and
have no clue
how useful our tool is.

The basic idea is: we developed visual interfaces to help the integration
of Web data
from distributed sites.  In the simplest way, one can just record whatever
did on the Web, and repeat the Web exploration anytime later.  In the more
scenery, one can visually program it to extract data from one site, fill in
form data
in another site, compute data on the fly and save the extracted data into

An example is demonstrated in the following irobot file:
http://irobotsoft.com/robots/blastx.irb (it requires installation of the
irobot software, which is free).

If you open it and run it, it will do a BLAST search for a given protein,
the matching sequences from GenBank, search for conserved domain from
CDART database, and save the results in an xml file "save.xml".

I am wondering if there is any real world problem that may require the
integration of
Web data like the one demonstrated above.  And additional functionality that
may be
required to solve larger problems.

On 12/7/06, Yannick Wurm <idh at poulet.org> wrote:
> Hi,
> we all know that gene predictions can be far from reliable. Which is
> why recent genome projects, such as the Honey bee, used several gene
> predictions: ab initio predictions, homology-based predictions, EST-
> based prediction...  They used GLEAN3 to combine the different
> predictions into one confident "Official Gene Set".
> I am exploring my favorite - currently unannotated - genome. Does
> anyone know of a web-tool which might do something similar on a much
> smaller scale? (I basically just want the CDS sequences for a small
> number of genes). If not, this could be a great opportunity for
> someone....
> Input:
>        my small genomic sequence of interest (maybe 10 or 20kb) + ESTs +
> other homologous sequences I may have
> Program fetches more homologous sequences from Genbank.
>        Aligns everything it can to my sequence of interest
> Program uses genscan or whatever to predict exons.
> Program figures out a "consensus" protein sequence and determins
> introns/exons.
> Any ideas? Comments?
> Cheers,
> yannick
