[BiO BB] Looking for researcher, to assist on blast-like invention

Theodore H. Smith delete at elfdata.com
Tue Feb 12 11:13:03 EST 2008

Hi Andreu,

I am definitely making my source code available to everyone, under  
open source agreement. I am not going the commercial route. And while  
I will protect my intellectual property, I am not going the patent  
route. I am not a believer of the whole aggressive "stop people doing  
stuff" idea.

I should have said that I am making this open source, at the start.

The main reason I am delaying in making it open source, is that I  
don't have a C++ version yet, so I have nothing to offer. And also I  
find source forge awkward to use and wastes a lot of time, compared to  
me just uploading the source code directly to my website and just  
putting an agreement saying "this is open source".

As for writing a paper... I don't really have the background in  
University to write a paper, meaning it would take me a lot longer to  
do than someone experienced in writing papers. And to be honest I feel  
it would distract me from my main goal, which is to spend my time  
doing something productive. I would rather someone else write a paper  
for me :) I think this would be a fair arrangement.

But I am happy to explain my algorithm.

I think I should write up a document however explaining it. Maybe not  
in academia, more in software developer style.

Thanks for all the interest and suggestion everyone. It's helping a lot.

On 12 Feb 2008, at 15:45, Andreu Alibés wrote:

> Why not making the code available to everybody in an Open Source
> repository like sourceforge?
> A
> On Feb 11, 2008 5:21 PM, Theodore H. Smith <delete at elfdata.com> wrote:
>> Hi everyone,
>> So I've been working, on and off, on this algorithm for quite a while
>> now. It's basically an invention of mine. It is a "blast-like"
>> algorithm, in that it does "Fuzzy lookup" operations across a  
>> database
>> of letters. I am designing this algorithm to be useful for bio-
>> informatics, this is the main field I am initially targetting.
>> The database will be filled with protein sequences, and the search
>> across the database will be another protein sequence. The algorithm
>> has a "scoring matrix", which can accept different protein  
>> replacement
>> scores. The cost of inserting letters (protein letters) can be
>> configured also.
>> In this sense, it's no different to Smith-Waterman. The same input,
>> the same output!
>> The real difference from Smith-Waterman, is it's speed. My algorithm
>> will be hugely faster. This is because I use many techniques to avoid
>> processing unnecessary parts of the Smith-Waterman matrix.
>> I also use many tricks to reuse computations across various proteins.
>> For example, the matrix for protein "ABCDE", is identical, at first
>> anyhow, for the matrix for "ABCDEFG". This means if I have both
>> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
>> both of them against the search query, in almost half the time. My
>> algorithm also runs in logarithmic-time with respect to the size of
>> the database. Basically, bigger databases run disproportionately  
>> faster.
>> I want to turn this algorithm, into something useful for people. My
>> first challenge here, is to answer the question "is this algorithm
>> faster, or better than BLAST". If it is not faster, my algorithm
>> basically has little use. But I have good hopes it will be faster! I
>> am very good with these sort of things, you see :) Speed is my  
>> strong-
>> point.
>> Currently, I do not know about the speed, because I haven't
>> implemented a C++ version of my algorithm, or a good speed testing
>> framework.
>> I do however know that my algorithm is more accurate than BLAST,
>> because it is just as accurate as SSEARCH, as mine uses the Smith-
>> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent  
>> guess-
>> work basically. A fine heuristic, but still a heuristic. Mine is
>> methodological, not heuristic based.
>> So here is what I am looking for!
>> I am hoping, that someone in the field will be able to offer me
>> guidance, interest, enthusiasm, suggestions and maybe even do some
>> testing for me.
>> Perhaps a student doing a bio-informatics related degree, who would
>> like to write a paper on an alternative way of processing protein
>> databases. My invention could be an interesting subject for a paper.
>> Or perhaps a researcher who just has an interest in these sort of
>> things! Perhaps a researcher who feels there must be a better way of
>> doing these things. Or anyone really in this field with the time and
>> interest, and feels helping me could help him (or her) too in some  
>> way.
>> I'd like someone I can ask a lot of questions to, and show my  
>> software
>> to, and explain my hopes what I can achieve with it.
>> Basically, my first question to you, would be "how would I set this  
>> up
>> to be useful for someone", and "how would I test it's usefulness,  
>> what
>> would you need to know about my algorithm that you would decide to  
>> use
>> it over blast"
>> It's sort of a vague question from me, like "what do you need me to
>> do", but... well that's where I am right now. Sort of a bit on the
>> outside hoping someone on the inside will show me something.
>> So it's an opportunity to tell me what you want, basically!! Tell me,
>> and I might just make it.
>> Who knows? Maybe one day in a few years time, everyone will be using
>> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
>> might be part of something.
>> Thanks to anyone who replies!
>> --
>> http://elfdata.com/plugin/
>> "String processing, done right"
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
> -- 
> Andreu Alibés, PhD
> Systems Biology Program - Center for Genomic Regulation
> c/ Dr. Aiguader 88, 08003 Barcelona, Spain
> Phone: +34 93 316 0258
> http://aalibes.googlepages.com/
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

"String processing, done right"

More information about the BBB mailing list