[BiO BB] Linear Bioinformatics workflow?

Tue Oct 4 10:40:48 EDT 2005

Amir,

There is another major category of workflow that I've seen:

"I have a [protein sequence | genomic region | compound | mass spec  
output] and I want to find out everything in the entire world about  
it.  Ideally, all the result values would be converted to a common  
vocabulary, format, and normalization.  I would rather not go to  
every website in the world, or even know about all those websites.   
Can you help me?"

This is a "wide" rather than a "deep" process.

As to your original question - my personal opinion is that interface  
design is really, really hard, and that if someone were going to come  
up with a good, generic way to put that sort of power in the hands of  
non-programmer types, it would have happened by now.

That said, if you narrow the problem enough that it doesn't have to  
do everything in the world, things get a lot simpler.   Each of the  
tools that people have mentioned have their strengths and  
weaknesses.  None will solve every problem.

I'm not aware of a really killer solution for your specific use case:
   - let users explore a limited set of tools, and dynamically build  
up a protocol
   - save that protocol in a personal workspace for future (personal)  
re-use and possible sharing
   - but keep it totally limited and simple so as not to intimidate  
non-programmers
   - Plus flexible enough to handle large-ish batches of data

Most of the commercial and free workflow engines will do this, but it  
sounds like the overhead of learning to use them is a bit much for  
your users?

-Chris Dwan

Amir Karger wrote:

> Several people mentioned 2-D graphical workflow tool in a  
> "Bioinformatics
> workflow?" thread on bioclusters. (I'm redirecting my non-cluster-y  
> question
> here.) While still a newbie, I'm getting the impression that many
> bioinformatics workflows are mostly linear, with obvious important
> exceptions like conditions and loops. For example, I had a client  
> last week
> who wanted to script:
>
> 1 blast [sequence=..., program=...] > blast.out
> 2 get hits from blast.out > blast.hits
> 3 find hits with 50-70% sequence identity from blast.hits >  
> blast.good_hits
> 3 download/fastacmd sequences for IDs in blast.good_hits > hits.fasta
> 4 clustalw hits.fasta > publishable_result (OK, not really)