[Bioclusters] Re: Rsync and NCBI and bio-mirror.net

Jeremy Mann jeremy at biochem.uthscsa.edu
Wed Feb 1 15:05:50 EST 2006

Don, we exclude FASTA from our BLAST database. We use rsync because of its
no-whole-file function. Why download the entire 50+ gig database every
night when all we need are the changes? Is there an FTP client that
supports just changes in the files?

Don Gilbert said:
> How high is demand for mirroring the FASTA/ subfolder of
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/ ?   I'll be happy
> to consider adding it to bio-mirror.net.   On the other hand,
> it is a large data chunk, and will add to network copy load
> which now is stretched for the blast-format tar files.
> We have almost continuous ftp copying from ncbi:/blast/db/ now
> due to the almost daily data turnover, ftp timeouts, and such.
> Those who want source data could instead use the
> Genbank dataset -> fasta, at a lower cost. E.g. only
> a few Genbank/WGS subsets are updated daily, whereas
> the whole 18 GB blast wgs.fasta is updated daily.
> Rsync is a nice tool, but has a much higher server side
> CPU cost than FTP. Those of you running into rsync errors at NCBI
> would probably have better luck using an FTP mirroring
> tool.
> - Don Gilbert
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
Phone: (210) 567-2672

More information about the Bioclusters mailing list