[BiO BB] Looking for researcher, to assist on blast-like invention

Sheng Wang bsmagic at gmail.com
Mon Feb 11 21:48:03 EST 2008


Maybe the BLAST package should be a software to which the user could develop
3rd-part addon.

On 2/12/08, Martin Gollery <marty.gollery at gmail.com> wrote:
>
> The first step is to implement it in C++ to see how fast it is. Once
> you have an executable, testing it will be relatively straightforward.
>
> Marty
>
>
> On Feb 11, 2008 8:21 AM, Theodore H. Smith <delete at elfdata.com> wrote:
> >
> > Hi everyone,
> >
> > So I've been working, on and off, on this algorithm for quite a while
> > now. It's basically an invention of mine. It is a "blast-like"
> > algorithm, in that it does "Fuzzy lookup" operations across a database
> > of letters. I am designing this algorithm to be useful for bio-
> > informatics, this is the main field I am initially targetting.
> >
> > The database will be filled with protein sequences, and the search
> > across the database will be another protein sequence. The algorithm
> > has a "scoring matrix", which can accept different protein replacement
> > scores. The cost of inserting letters (protein letters) can be
> > configured also.
> >
> > In this sense, it's no different to Smith-Waterman. The same input,
> > the same output!
> >
> > The real difference from Smith-Waterman, is it's speed. My algorithm
> > will be hugely faster. This is because I use many techniques to avoid
> > processing unnecessary parts of the Smith-Waterman matrix.
> >
> > I also use many tricks to reuse computations across various proteins.
> > For example, the matrix for protein "ABCDE", is identical, at first
> > anyhow, for the matrix for "ABCDEFG". This means if I have both
> > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
> > both of them against the search query, in almost half the time. My
> > algorithm also runs in logarithmic-time with respect to the size of
> > the database. Basically, bigger databases run disproportionately faster.
> >
> > I want to turn this algorithm, into something useful for people. My
> > first challenge here, is to answer the question "is this algorithm
> > faster, or better than BLAST". If it is not faster, my algorithm
> > basically has little use. But I have good hopes it will be faster! I
> > am very good with these sort of things, you see :) Speed is my strong-
> > point.
> >
> > Currently, I do not know about the speed, because I haven't
> > implemented a C++ version of my algorithm, or a good speed testing
> > framework.
> >
> > I do however know that my algorithm is more accurate than BLAST,
> > because it is just as accurate as SSEARCH, as mine uses the Smith-
> > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess-
> > work basically. A fine heuristic, but still a heuristic. Mine is
> > methodological, not heuristic based.
> >
> > So here is what I am looking for!
> >
> > I am hoping, that someone in the field will be able to offer me
> > guidance, interest, enthusiasm, suggestions and maybe even do some
> > testing for me.
> >
> > Perhaps a student doing a bio-informatics related degree, who would
> > like to write a paper on an alternative way of processing protein
> > databases. My invention could be an interesting subject for a paper.
> >
> > Or perhaps a researcher who just has an interest in these sort of
> > things! Perhaps a researcher who feels there must be a better way of
> > doing these things. Or anyone really in this field with the time and
> > interest, and feels helping me could help him (or her) too in some way.
> >
> > I'd like someone I can ask a lot of questions to, and show my software
> > to, and explain my hopes what I can achieve with it.
> >
> > Basically, my first question to you, would be "how would I set this up
> > to be useful for someone", and "how would I test it's usefulness, what
> > would you need to know about my algorithm that you would decide to use
> > it over blast"
> >
> > It's sort of a vague question from me, like "what do you need me to
> > do", but... well that's where I am right now. Sort of a bit on the
> > outside hoping someone on the inside will show me something.
> >
> > So it's an opportunity to tell me what you want, basically!! Tell me,
> > and I might just make it.
> >
> > Who knows? Maybe one day in a few years time, everyone will be using
> > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
> > might be part of something.
> >
> > Thanks to anyone who replies!
> >
> > --
> > http://elfdata.com/plugin/
> > "String processing, done right"
> >
> >
> >
> > _______________________________________________
> > BBB mailing list
> > BBB at bioinformatics.org
> > http://www.bioinformatics.org/mailman/listinfo/bbb
> >
>
>
>
> --
> --
> Martin Gollery
> Senior Bioinformatics Scientist
> TimeLogic- a Division of Active Motif
> 775-833-9113
> 880 Northwood Blvd. Suite 7
> Incline Village, NV 89451
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>



-- 
Best Regards
Sheng Wang



More information about the BBB mailing list