[Bioclusters] Re: Rsync and NCBI and bio-mirror.net

Aaron Darling darling at cs.wisc.edu
Thu Feb 2 04:39:14 EST 2006


I had the pleasure of meeting David Lipman (director of NCBI) at an NIH 
conference last summer and suggested to him that NCBI run a bittorent 
server.  He had just given a talk about NCBI in which he boasted that 
they move terabits of data every day, and have the capacity to move 
more.  What he didn't seem to understand was that it wouldn't matter how 
much bandwidth NCBI has if I'm downloading data on the other side of the 
planet (or New Mexico in this case) and there's a slow link somewhere in 
between us.  Both NCBI and Los Alamos have very fat network pipes, but 
for some reason I can only download FastA at around 300KB/s to LANL.  At 
UW-Madison I get 1.5MByte/s.
Anyways, my point is that it would be great if NCBI would set up a 
bittorent server themselves, but I wouldn't hold my breath waiting for 
their leadership on the issue.

Practically speaking, we'd need a reasonably large number of people 
seeding these files for it to work.  While I'm not qualified to set up 
and administer a bt server, I'm willing to contribute some resources by 
running a client.  Anybody else?

-Aaron


Steve O wrote:

> Hi,
> After messing around for a while trying to optimize the ncbi 
> downloads, I realized independent of the rsync costs that simply 
> FTPing the formatted files was the best bet.  Even that was a bit 
> tricky since the files might change while you're downloading them.  I 
> no longer have to deal with this problem, but what I wished at the 
> time was that some brave soul, perhaps bio-mirror, would offer a bit 
> torrent of, say, Monday's snapshot of a file.  Then every subsequent 
> day, a bit torrent would be offered of the diffs of the unpacked file 
> vs. the previous unpacked version.  Sites that wished to keep up with 
> daily changes could apply the patches, while less critical 
> applications could just get the weekly distribution.  Running a bit 
> torrent client would require server resources, but the actual load on 
> your servers should be reduced significantly.
>
> -steve
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters




More information about the Bioclusters mailing list