[Bioclusters] Versioning databases
Mike Cariaso
cariaso at yahoo.com
Sun Jun 4 23:02:45 EDT 2006
It seems much of this could be addressed by a svn repository. I know I'd sure appreciate typing 'svn update nt'. What was in your prototype?
----- Original Message ----
From: Joe Landman <landman at scalableinformatics.com>
To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org>
Sent: Sunday, June 4, 2006 10:33:40 PM
Subject: Re: [Bioclusters] Versioning databases
Sounds nice. I had thought of also (somehow) saving diffs in a db so
you could generate the test db you used previously. Don't know if there
is interest in this, but we had a prototype of this a few years ago.
Joe
Michael James wrote:
> Some biological databases actually come in versions,
> for example; we are up to the TIGR4 rice genome and
> swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006
>
> Others just change daily, NCBI:nr NCBI:nt etc.
>
> All this effort creates a problem for repeatability,
> the blast results you get next week
> won't quite be the ones you got today.
>
> It seems to me that the situation would be improved
> by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000"
>
> This means we need to come up with a versioning scheme
> and for anything without, I'd suggest something as simple as
> issuing_authority database date 3_digit_release_number
> eg ncbi.nih.gov nr 2006-06-05 000
>
> For uniqueness, use the internet name for issuing_authority.
>
> The database is the filename stripped of all qualifiers
> Remove things like .gz .00.tar.gz
>
> The date in ISO format!
>
> 3 more digits to ensure uniqueness.
>
>
> Such a scheme would also be
> a big win for us database administrators.
> We could start to weave it through the tangled web
> of different providers and formats
> so we actually know the original issuing authority
> for the file we are downloading.
>
> What do you think?
> michaelj
>
>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Bioclusters maillist - Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/bioclusters/attachments/20060604/6fa01036/attachment.html
More information about the Bioclusters
mailing list