<html><head><style type="text/css"><!-- DIV {margin:0px} --></style></head><body><div style="font-family:times new roman, new york, times, serif;font-size:12pt"><div style="font-family: times new roman,new york,times,serif; font-size: 12pt;"><div>It seems much of this could be addressed by a svn repository. I know I'd sure appreciate typing 'svn update nt'. What was in your prototype?<br><br>----- Original Message ----<br>From: Joe Landman <landman@scalableinformatics.com><br>To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters@bioinformatics.org><br>Sent: Sunday, June 4, 2006 10:33:40 PM<br>Subject: Re: [Bioclusters] Versioning databases<br><br><div>Sounds nice. I had thought of also (somehow) saving diffs in a db so <br>you could generate the test db you used previously. Don't know if there <br>is interest in this, but we had a prototype of this a few years ago.<br><br>Joe<br><br>Michael
James wrote:<br>> Some biological databases actually come in versions,<br>> for example; we are up to the TIGR4 rice genome and<br>> swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006<br>> <br>> Others just change daily, NCBI:nr NCBI:nt etc.<br>> <br>> All this effort creates a problem for repeatability,<br>> the blast results you get next week<br>> won't quite be the ones you got today.<br>> <br>> It seems to me that the situation would be improved<br>> by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000"<br>> <br>> This means we need to come up with a versioning scheme<br>> and for anything without, I'd suggest something as simple as<br>> issuing_authority database date 3_digit_release_number<br>>
eg ncbi.nih.gov nr 2006-06-05 000<br>> <br>> For uniqueness, use the internet name for issuing_authority.<br>> <br>> The database is the filename stripped of all qualifiers<br>> Remove things like .gz .00.tar.gz <br>> <br>> The date in ISO format!<br>> <br>> 3 more digits to ensure uniqueness.<br>> <br>> <br>> Such a scheme would also be<br>> a big win for us database administrators.<br>> We could start to weave it through the tangled web<br>> of different providers and formats<br>> so we actually know the original issuing authority<br>> for the file we are downloading.<br>> <br>> What do you think?<br>> michaelj<br>> <br>> <br><br>-- <br>Joseph Landman, Ph.D<br>Founder and CEO<br>Scalable
Informatics LLC,<br>email: landman@scalableinformatics.com<br>web : <a target="_blank" href="http://www.scalableinformatics.com">http://www.scalableinformatics.com</a><br>phone: +1 734 786 8423<br>fax : +1 734 786 8452<br>cell : +1 734 612 4615<br>_______________________________________________<br>Bioclusters maillist - Bioclusters@bioinformatics.org<br><a target="_blank" href="https://bioinformatics.org/mailman/listinfo/bioclusters">https://bioinformatics.org/mailman/listinfo/bioclusters</a><br></div></div></div></div></body></html>