BIRCHv3.20
From Bioinformatics.Org Wiki
[return to Release To Do List]
Contents |
BIRCH
getbirch
During update, birch_install.py fails to generate a new localstats.tsv file.on update, changed default so that getbirch does NOT generate a backup copy of the current install.
BioLegato
- Bugzilla #1207 Errors in writing GenBank files
This turns out to be a bug in readseq. I sent a note to Don Gilbert to fix this.As a wrkaround, the following BioLegato menus have been modifid:- bldna - Export Foreign Format, ViewSeq
birchadmin
birchadmin script - convert to use OK box to display an error message if launched by someone other than birchadmin
blncbi
It looks like importing TSV files using Import from csv will cause empty fields to be omitted, shifting cells in a row to the left. This does not appear to happen using File --> Open. This was seen importing sample.tsv from PLNT4610, Assignment 5.There may be a way to create a query term to search for a particular molecule type. In the Entrez Advanced search, the index list for Properties includes a subset with labels such as 'biomol crna', 'biomol genomic', 'biomol mrna' and so forth. This is probably what we see in the Summary in column 3. It should be possible to create a query term in blncbi that will have this biomol subset.
BLAST
- blastdbkit
if dblist is the word 'all', update all installed databaseschange the names of the following options(BioLegato, blastdbkit, blastdbkit.sh, documentation):- --addfiles to --add
- --updatedb to --update
- --deletefiles to --delete
consider whether blastdbkit.sh should be able to use --configure, --reportlocal, --reportftpneed an equivalent to the --showall function in update_blastdb.pl. All this does is to get a list of all remote files and extract the database names from the .tar.gz files. We do just about all of this anyway, so why not have a function?for --reportlocal and --reportftp, can we launch LibreOffice using a wrapper?script/bgwrapper.shget report to include stats on BLASTDB filesystem - use os.statvfs(path)
See https://stackoverflow.com/questions/4260116/find-size-and-free-space-of-the-filesystem-containing-a-given-fileblasdtdbkit.sh - popup window telling how much additional disk space and add or update will require, or whether or not space will be exceeded.This would add a substantial layer of complexity, and no real flexibility. You can actually do more in the spreadsheets, so it's best to leave it at that.- parameter to set a default FTP site, or as an alternative, figure out the closest site
change name of script to blastdbkitchange name in BioLegato and in documentation and in getbirch- birchadmin --> UpdateAddInstall --> Configure
move change BIRCH directory to Settings? No. Makes more sense where it is.If we do that, can we get rid of this menu item entirely? For example, it might be nice to be able to run --configure if you want to force re-reading of the .list files and re-writing of menus etc.
- BLASTDB.list
change to a dictionary format? Definitely use an internal dictionary for clarity, but it's probably simpler to keep the datafiles as lists. Re-casting the datafiles as dictionaries will be easier anyway if the data are represented internally as dicitonaries.add a field listing the decompression ratioadd a field listing names of sub-database files? Mainly for the special cases of human_genomic_transcripts and mouse_genomic_transcripts. Since these are a special case, it might be more straightforward to keep most of the code as is, and to add specific code to handle these two cases.
method for summarizing disk usage for each section of the databaseon the linux-intel VM as user BIRCHBINDEV, BLupdate_blastdb.pl won't download files. It connects to the NCBI FTP site, but just gives the connect message 'Connected to ftp.ncbi.nlm.nih.gov'. If you try to download a database file (eg. vector) it says 'vector not found, skipping'. THIS MAY BE FIXED NOW. CHECK.
- Report doesn't appear, but blnfetch/blpfetch do. This is not a problem with Gedit, but does happen when Nedit is the editor. Also doesn't affect local BLAST, only NCBI BLAST.
- Update binaries to 2.3.0
linux-x86_64linux-intel- osx-x86_64 - NCBI binaries won't run on albacore (OSX 10.6.x) and peacock (OSX 10.7.5 Lion). On peacock, we get "Illegal instruction:4". On albacore we get "Illegal instruction". This is described at http://stackoverflow.com/questions/14268887/what-is-the-illegal-instruction-4-error-and-why-does-mmacosx-version-min-10
Obtained a patch from Aaron Ucko [ucko@ncbi.nlm.nih.gov] at NCBI:
--- dbapi_impl_context.cpp (revision 490682) +++ dbapi_impl_context.cpp (working copy) @@ -393,6 +393,11 @@ return CDBConnParamsDelegate::GetParam(key); } } + +private: + // Non-copyable. + CDBConnParams_Unpooled(const CDBConnParams_Unpooled& other); + CDBConnParams_Unpooled& operator =(const CDBConnParams_Unpooled& other); }; CDB_Connection* @@ -499,8 +504,8 @@ #endif if (params.GetParam("pool_allow_temp_overflow") == "true") { - return MakePooledConnection - (CDBConnParams_Unpooled(params)); + CDBConnParams_Unpooled unpooled_params(params); + return MakePooledConnection(unpooled_params); } else { return NULL; }
Patch was applied on albacore as follows:
- file saved as patch2.diff in the directory /Users/birchbindev/BIRCHBINDEV/install/ncbi-blast-2.3.0+-src/c++/src/dbapi/driver}
- patch < patch2.diff
creates a new dbapi_impl_context.cpp - go back to ncbi-blast-2.3.0+-src/c++
- type 'make' to make the binaries. Next, you have to type 'make install' to install the binaries in the bin directory. Binaries are now ready to copy to bin-osx-x86_64.
blnfetch, blpfetch
It should be possible to immediately type in gi numbers or accession numbers to fetch. Right now, if you launch these programs from the command line, all you can do is import a csv file. Probably the right way to do this is to add a File --> New rows item. This would run a script that creates a TSV file with a specified number of rows and columns and imports them into the current window. We might also consider running this script when blnfetch and blpfetch startup, so that there is at least one row.
Quick fix: added a simplistic Add Row function to blnfetch and blpfetch. The function reads in an empty file, which forces the addition of two rows.
The right way to do this is to add built-in Add Rows and Add Columns functions to BioLegato. This has been added to the BioLegato To Do list.
Applications
Sequence Features
We need to update the list of sequence features supported, based on the NCBI Feature Table definition. For example, it looks like NCBI has retasked promoter features to a more general 'regulatory' feature key, that is further described by qualifiers like 'promoter'. Affected programs include:
FEATURES/getobbldnablncbi
Some GenBank entries contain an 'assembly_gap' feature key. That key is not listed in the web Entrez search list, yet it appears in some entries. Searching using 'assembly_gap' as a [FKEY] returns 0 results. This may need to be reported to NCBI, although it is likely they know about it.
New Keys: assembly_gap,gap,gene,mobile_element,ncRNA,regulatory,telomere,tmRNA
There is a disconnect between the official NCBI/EBI/DDBJ Feature Keys Definition and the feature keys allowed on the NCBI Entrez web site. As of Dec. 2015, there were 53 feature keys allowed in GenBank entries. However, the web Entrez search builder allows 201 FKEY items.