[Bioclusters] Versioning databases

Joe Landman landman at scalableinformatics.com
Sun Jun 4 23:06:10 EDT 2006


Just a simple postgresql saving of compressed deltas with a simple front 
end.  SVN wasn't popular at the time, and cvs didn't look like it could 
handle it.  Even svn might blow lots of time in diff calculation.

Mike Cariaso wrote:
> 
> It seems much of this could be addressed by a svn repository. I know I'd 
> sure appreciate typing 'svn update nt'. What was in your prototype?
> 
> ----- Original Message ----
> From: Joe Landman <landman at scalableinformatics.com>
> To: "Clustering, compute farming & distributed computing in life science 
> informatics" <bioclusters at bioinformatics.org>
> Sent: Sunday, June 4, 2006 10:33:40 PM
> Subject: Re: [Bioclusters] Versioning databases
> 
> Sounds nice.  I had thought of also (somehow) saving diffs in a db so
> you could generate the test db you used previously.  Don't know if there
> is interest in this, but we had a prototype of this a few years ago.
> 
> Joe
> 
> Michael James wrote:
>  > Some biological databases actually come in versions,
>  >  for example;  we are up to the TIGR4 rice genome and
>  >  swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006
>  >
>  > Others just change daily, NCBI:nr  NCBI:nt  etc.
>  >
>  > All this effort creates a problem for repeatability,
>  >  the blast results you get next week
>  >  won't quite be the ones you got today.
>  >
>  > It seems to me that the situation would be improved
>  >  by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000"
>  >
>  > This means we need to come up with a versioning scheme
>  >  and for anything without, I'd suggest something as simple as
>  >    issuing_authority  database  date    3_digit_release_number
>  > eg  ncbi.nih.gov           nr  2006-06-05          000
>  >
>  > For uniqueness, use the internet name for issuing_authority.
>  >
>  > The database is the filename stripped of all qualifiers
>  > Remove things like  .gz  .00.tar.gz  
>  >
>  > The date in ISO format!
>  >
>  > 3 more digits to ensure uniqueness.
>  >
>  >
>  > Such a scheme would also be
>  >  a big win for us database administrators.
>  > We could start to weave it through the tangled web
>  >  of different providers and formats
>  >  so we actually know the original issuing authority
>  >  for the file we are downloading.
>  >
>  > What do you think?
>  > michaelj
>  >
>  >
> 
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
> phone: +1 734 786 8423
> fax  : +1 734 786 8452
> cell : +1 734 612 4615
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615


More information about the Bioclusters mailing list