[Bioclusters] Versioning databases
Michael James
Michael.James at csiro.au
Sun Jun 4 22:29:39 EDT 2006
Some biological databases actually come in versions,
for example; we are up to the TIGR4 rice genome and
swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006
Others just change daily, NCBI:nr NCBI:nt etc.
All this effort creates a problem for repeatability,
the blast results you get next week
won't quite be the ones you got today.
It seems to me that the situation would be improved
by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000"
This means we need to come up with a versioning scheme
and for anything without, I'd suggest something as simple as
issuing_authority database date 3_digit_release_number
eg ncbi.nih.gov nr 2006-06-05 000
For uniqueness, use the internet name for issuing_authority.
The database is the filename stripped of all qualifiers
Remove things like .gz .00.tar.gz
The date in ISO format!
3 more digits to ensure uniqueness.
Such a scheme would also be
a big win for us database administrators.
We could start to weave it through the tangled web
of different providers and formats
so we actually know the original issuing authority
for the file we are downloading.
What do you think?
michaelj
--
Michael James michael.james at csiro.au
System Administrator voice: 02 6246 5040
CSIRO Bioinformatics Facility fax: 02 6246 5166
No matter how much you pay for software,
you always get less than you hoped.
Unless you pay nothing, then you get more.
More information about the Bioclusters
mailing list