BIRCHv3.70
From Bioinformatics.Org Wiki
[return to Release To Do List]
Contents |
BIRCH
- User settings - We need to have a directory for each user to store their personalized settings for BIRCH.
**Location - Both Linux and OSX use $HOME/.local to store application-specific files for each user. This standard has been around for many years. So let's put it there.
The question is where? Many applications seem to put everything in ./local/share, or .local/share/applications, but we need to find out exactly what the convention is. Whatever the directory, let's create a subdirectory called 'birch'.
- Files - There should be a BIRCH.settings file, exactly as found in local/admin. We already have scripts for reading these files, although a simple shell script could get any parameter and set it as an environment variable upon login.
- Implementation
newuser - newuser will check to see if the directory exists, and if it is up to date. If not, it will update that directory.
So far, I can see no compelling reason to run in newuser. Let's avoid this bit of complexity unless some really good reason comes up later.- running from cshrc.source and profile.source : probably has to do the same thing as newuser each time we login.
blncbi, blnfetch and blpfetch now prompt the user to set BL_EMAIL, if it is not set.
NCBI API keys for Eutils
Background; To better regulate traffic from Eutils, Eutils requests are not processed if there are more than 3 requests in any 1 second window from a given IP address. Individual users can get an API key by getting a MyNCBI account. See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities
Why and how should I get an API key to use the E-utilities?
Problem: How do we make it easy for BIRCH users to get an API key, and how do we get BIRCH to use that key?
- Quick answer - nothing will break for users that don't have an API key. However, in particular for use of Eutils in a classroom setting, the 3 req/sec limitation might prevent users from getting any results.
- Need to set an environment variable eg. NCBI_ENTREZ_KEY.
- Best way to do that is to have a BIRCH mechanism for user-specified settings. This is long overdue anyway.
- Maybe we need a BioLegato interface for setting user preferences, or a menu item in the BIRCH launcher.
- In scripts, the best way to implement is to have the script look for the NCBI_ENTREZ_KEY. If it isn't set, don't submit a key with requests. If it is set, include the key with requests. That way not having a key will be pretty transparent to most users.
Which 3rd party applications might be affected? A: anything that uses Eutils- BioPython - Doesn't require NCBI key, but you can set it using Entrez.key.
- Ugene - no provision for NCBI key, but doesn't appear to need it.
- BioConductor
- Artemis - no provision for NCBI key, but doesn't appear to need it.
birchadmin
Add "birchadmin" to the bitmap image for birchadmin (above Administration Tool)?
For consistency with other BioLegato applications, just added "birchadmin" to the title bar.Add database codes to Add, Install/Update, Delete menus?
bldna, blprotein
Update MAFFT to latest version, April 2020.Let's get rid of BL_CORES entirely, which means we have to be thorough in checking whether this is used by other programs. We could have done this in v3.60, but it's safer to do this in v3.70. In case it breaks anything, we have the luxury of finding out during the testing stage.FASTA: tsv output in blnfetch, blpfetch fails to appear for some databases. Example - search mouse catalase (BC013447) protein against mouse_genome. HTML report appears, but blnfetch report never pops up in blnfetch. This problem appears to occur when FASTA reports several hits for the same library sequence. This is probably a bug in how blfastaout.py processes .tsv output. The TSV file is seen as a bio*.tmp file, but never gets blnfetch. There is no problem with databases such as uniprot or refseq_rna, in which multiple hits aren't seen.
blprotein - remove Patterns --> Consensus matrix (and probably Patterns menu unless we have something new to put there).
BLAST+
We need to have a table somewhere with the database codes and the corresponding full names of the databases. Maybe this could be part of the database reports?
Definitely NOT the database report. Adding an extra column would be counterproductive. The codes now automatically appear in the bldna and blprotein seach menus too, so there doesn't seem to be much need for this table.
Phylogeny
We need a way to add phylogenetic information, and maybe other information, to phylogenetic trees, based on Accession number.
One approach is to use blastcmd eg.
{neptune:/home/psgendb/temp}blastdbcmd -entry AB005234 -db nt -outfmt %S,%N,%K Arabidopsis thaliana,Arabidopsis thaliana,Eukaryota
The output from something like this could be used to add this information to a phyloXML file using the phylogeny decorator from the forester package. One problem with blastcmd is that by default it reads info from a local copy of a blast database. If you want to get the information from NCBI, you need an RID from a previous BLAST search. Also, blastdbcmd needs to be told which database to get the information from.
It may also be possible to do this using NCBI query. Is that true of Eutils?
Jalview
Current version won't work for users who don't have write access to Jalview home directory. Question has been posted to jalview-discuss@jalview.org asking if there is a workaround.
Solution: In addition to the installer, Jalview can also be obtained from their github as a single Jar file with the complete package. This is platform-independent, although the current version requires a Java 8 JRE. Since many systems now default to Java 11, the solution is to have a universal copy of the Jar file in $BIRCH/java, and a platform-specific Java 8 JRE in $BIRCH/lib-$BIRCH_PLATFORM. Run Jalview from a script that finds the JRE, and everything works. Future versions will be Java 11 compliant, and we should be able to do away with the JRE at that point.
TESTING
host | system | status |
CCL - BIRCHDEV | RHEL7 | completed |
brassica - Fedora31-2 VM | fedora 31 | completed |
brassica | MATE Ubuntu 16 LTS | complete |
peacock | MacOSX | completed |
maui | Ubuntu MATE 18.0.4 LTS | ND |
wotan | Scientific Linux 7 | completed |
flamingo | Ubuntu MATE 18.0.4 LTS | completed |
triticum | Ubuntu MATE 18.0.4 LTS | completed |
CCL - psgendb | RHEL7 | COMPLETED |