[Bioclusters] Versioning databases
J.W. Bizzaro
jeff at bioinformatics.org
Mon Jun 5 15:25:17 EDT 2006
Dan Bolser once suggested the use of a software packaging system like RPM for
providing updates to DBs containing multiple flat files. It's especially
appealing if it's done in combination with a downloader like yum, and I think
it's something that Bioinformatics.Org might pursue. It may be relevant to
your suggestion, since package managers are aware of version numbers and can
revert an installed package to an old version. Large DBs contained in a single
file would be problematic, though.
Cheers,
Jeff
Michael James wrote:
> Some biological databases actually come in versions,
> for example; we are up to the TIGR4 rice genome and
> swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006
>
> Others just change daily, NCBI:nr NCBI:nt etc.
>
> All this effort creates a problem for repeatability,
> the blast results you get next week
> won't quite be the ones you got today.
>
> It seems to me that the situation would be improved
> by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000"
>
> This means we need to come up with a versioning scheme
> and for anything without, I'd suggest something as simple as
> issuing_authority database date 3_digit_release_number
> eg ncbi.nih.gov nr 2006-06-05 000
>
> For uniqueness, use the internet name for issuing_authority.
>
> The database is the filename stripped of all qualifiers
> Remove things like .gz .00.tar.gz
>
> The date in ISO format!
>
> 3 more digits to ensure uniqueness.
>
>
> Such a scheme would also be
> a big win for us database administrators.
> We could start to weave it through the tangled web
> of different providers and formats
> so we actually know the original issuing authority
> for the file we are downloading.
>
> What do you think?
> michaelj
>
>
--
J.W. Bizzaro
Bioinformatics Organization, Inc. (Bioinformatics.Org)
E-mail: jeff at bioinformatics.org
Phone: +1 508 890 8600
--
More information about the Bioclusters
mailing list