[BiO BB] Looking for researcher, to assist on blast-like invention

Theodore H. Smith delete at elfdata.com
Mon Feb 11 11:21:23 EST 2008


Hi everyone,

So I've been working, on and off, on this algorithm for quite a while  
now. It's basically an invention of mine. It is a "blast-like"  
algorithm, in that it does "Fuzzy lookup" operations across a database  
of letters. I am designing this algorithm to be useful for bio- 
informatics, this is the main field I am initially targetting.

The database will be filled with protein sequences, and the search  
across the database will be another protein sequence. The algorithm  
has a "scoring matrix", which can accept different protein replacement  
scores. The cost of inserting letters (protein letters) can be  
configured also.

In this sense, it's no different to Smith-Waterman. The same input,  
the same output!

The real difference from Smith-Waterman, is it's speed. My algorithm  
will be hugely faster. This is because I use many techniques to avoid  
processing unnecessary parts of the Smith-Waterman matrix.

I also use many tricks to reuse computations across various proteins.  
For example, the matrix for protein "ABCDE", is identical, at first  
anyhow, for the matrix for "ABCDEFG". This means if I have both  
proteins "ABCDE", and "ABCDEFG" in my protein database, I can test  
both of them against the search query, in almost half the time. My  
algorithm also runs in logarithmic-time with respect to the size of  
the database. Basically, bigger databases run disproportionately faster.

I want to turn this algorithm, into something useful for people. My  
first challenge here, is to answer the question "is this algorithm  
faster, or better than BLAST". If it is not faster, my algorithm  
basically has little use. But I have good hopes it will be faster! I  
am very good with these sort of things, you see :) Speed is my strong- 
point.

Currently, I do not know about the speed, because I haven't  
implemented a C++ version of my algorithm, or a good speed testing  
framework.

I do however know that my algorithm is more accurate than BLAST,  
because it is just as accurate as SSEARCH, as mine uses the Smith- 
Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- 
work basically. A fine heuristic, but still a heuristic. Mine is  
methodological, not heuristic based.

So here is what I am looking for!

I am hoping, that someone in the field will be able to offer me  
guidance, interest, enthusiasm, suggestions and maybe even do some  
testing for me.

Perhaps a student doing a bio-informatics related degree, who would  
like to write a paper on an alternative way of processing protein  
databases. My invention could be an interesting subject for a paper.

Or perhaps a researcher who just has an interest in these sort of  
things! Perhaps a researcher who feels there must be a better way of  
doing these things. Or anyone really in this field with the time and  
interest, and feels helping me could help him (or her) too in some way.

I'd like someone I can ask a lot of questions to, and show my software  
to, and explain my hopes what I can achieve with it.

Basically, my first question to you, would be "how would I set this up  
to be useful for someone", and "how would I test it's usefulness, what  
would you need to know about my algorithm that you would decide to use  
it over blast"

It's sort of a vague question from me, like "what do you need me to  
do", but... well that's where I am right now. Sort of a bit on the  
outside hoping someone on the inside will show me something.

So it's an opportunity to tell me what you want, basically!! Tell me,  
and I might just make it.

Who knows? Maybe one day in a few years time, everyone will be using  
this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You  
might be part of something.

Thanks to anyone who replies!

--
http://elfdata.com/plugin/
"String processing, done right"






More information about the BBB mailing list