[Bioclusters] Re: Rsync and NCBI and bio-mirror.net

Don Gilbert gilbertd at bio.indiana.edu
Thu Feb 2 11:14:52 EST 2006


The FTP and Rsync protocols are different. Rsync makes possible
partial updates or 'delta transmission'.  FTP doesn't offer
this.   However, as I noted the Rsync partial update mechanism
comes at a cost at server side, such that it can be slower to use.
That depends on how distant, network-wise your client is from
server.  This is another reason to have more Bio-Mirror sites, as it could
substatially reduce these daily updates of bio-data.

In Jeremy's tests, you will need to change the file after first
fetch, otherwise Rsync and FTP can see it as same file and do
no data transport but for checking.  I used touch to change dates
for this.

We've considered Bittorrent, but it doesn't quite have what would 
make it good for bio-mirroring. 

>From Markus.Buchhorn at anu.edu.au  [ on Bittorrent and like p2p systems ]
> I've done a fair bit of investigation with a range of p2p systems,
> wearing my data-technology research hat. They all have some benefits,
> but none really yet suit the biomirror needs.
...
> This works very well for a community of hundreds of parallel downloaders, 
> for content that is popular for a short period but less so for a small circle 
> of longer-term caching mirrors.
> 
> Most of the p2p systems aren't designed for hierarchical environments, 
> nor optimise their transfers based on network topology.
> 
> What would be really nice is an outline of what we'd like the bio-mirror network 
> to look like. So things like
>  - fast data transfers between mirrors
>  - bidirectional data movement (automated)
>  - ability for new groups to publish into the bio-mirror cloud, at any node
>  - maintenance of access control requirements for data across the cloud
>  - new kinds of data?
>  - mirroring of (or references to external) value-add (web) services?
>  - ability to identify a 'nearest' or 'least-cost' server
>  - lots more good ideas...

I've looked at  Micah Beck's work, and others, such as SDSC's Storage
Resource Broker, which has good foundations at TeraGrid and othere
Grid centers.

Having more Bio-Mirror servers would be a practical step. FTP from a 
near-by server, network-wise, is much faster than using any protocol
if everyone is going to one central network source.

-- Don
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


More information about the Bioclusters mailing list