On 02-Dec-03, Donald Becker wrote: > Back to the core point: to checkpoint a pipeline the in-pipe data has to > be throttled and drained, or > extracted and stored > This goes beyond checkpointing a single process. And a pipeline > spanning machines is even more interesting. Careful, there are two meanings of pipeline being used here. One is the traditional Unix pipeline, 'foo | bar | baz', which I suspect is what Don is saying is hard to checkpoint, because of pipeline buffers and so on, and the other is what most genomics people would think of as a pipeline, which is a set of analysis jobs which may depend on each other, and which normally use some mechanism other than Unix pipes to pass the data from one part of the pipeline to another (in the case of Ensembl such status is held in the pipeline MySQL database). It was this second sort of pipeline that I was talking about, and they are not too difficult to checkpoint, in theory, especially if you are willing to re-run an individual blast job or whatever, so you only need to checkpoint between individual analysis phases. Tim -- Dr Tim Cutts Informatics Systems Group Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK