[Bioclusters] Versioning databases

Mike Cariaso cariaso at yahoo.com
Sun Jun 4 23:02:45 EDT 2006


It seems much of this could be addressed by a svn repository. I know I'd sure appreciate typing 'svn update nt'. What was in your prototype?

----- Original Message ----
From: Joe Landman <landman at scalableinformatics.com>
To: "Clustering,  compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org>
Sent: Sunday, June 4, 2006 10:33:40 PM
Subject: Re: [Bioclusters] Versioning databases

Sounds nice.  I had thought of also (somehow) saving diffs in a db so 
you could generate the test db you used previously.  Don't know if there 
is interest in this, but we had a prototype of this a few years ago.

Joe

Michael James wrote:
> Some biological databases actually come in versions,
>  for example;  we are up to the TIGR4 rice genome and
>  swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006
> 
> Others just change daily, NCBI:nr  NCBI:nt  etc.
> 
> All this effort creates a problem for repeatability,
>  the blast results you get next week
>  won't quite be the ones you got today.
> 
> It seems to me that the situation would be improved
>  by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000"
> 
> This means we need to come up with a versioning scheme
>  and for anything without, I'd suggest something as simple as
>    issuing_authority  database  date    3_digit_release_number
> eg  ncbi.nih.gov           nr  2006-06-05          000
> 
> For uniqueness, use the internet name for issuing_authority.
> 
> The database is the filename stripped of all qualifiers
> Remove things like  .gz  .00.tar.gz  
> 
> The date in ISO format!
> 
> 3 more digits to ensure uniqueness.
> 
> 
> Such a scheme would also be
>  a big win for us database administrators.
> We could start to weave it through the tangled web
>  of different providers and formats
>  so we actually know the original issuing authority
>  for the file we are downloading.
> 
> What do you think?
> michaelj
> 
> 

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Bioclusters maillist  -  Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/bioclusters/attachments/20060604/6fa01036/attachment.html


More information about the Bioclusters mailing list