From Bioinformatics.Org Wiki
- Fedora seems to come with headless java as default. Oddly, if you type java -version the message gives no indication that it is headless.
- Java does have a way to set the environment using System.environment to set headless to false.
- /home/psgendb - Failure to install is often due to stale NSF handles, whose filenames are of the form ".nfs.....". These have to be removed by sysadmins. With v3.70 I tried the tactic of running an uninstall before installing the new version of BIRCH. The uninstall failed because of the stale nfs handles. However, I wasn't able to complete the install because getbirch tries to find the uninstall script through a symbolic link in public_html, which I had already deleted manually. Getbirch should be modified to get all files directly through the FTP site, and not have to go through /home/psgendb.
IMPORTANT - Are the stale NFS handles coming from BIRCHDEV each and every time I run framework.csh? Better check this.
Hmmmm... It seems that simply moving the directories containing the stale handles causes the handles to be deleted by the system, after awhile. Still, we may want to check for stale handles before updating BIRCH in /home/psgendb. Maybe makeframework.csh should exclude those as well.
- We need to test to see if the existing launchers will work in KDE. If not, find out where the launchers go.
- Check programs for compliance with 64-bit GI numbers in NCBI databases.
- Mail server - Need documentation how to set up mail server for email notification.
- The Phylip web site mentions "A new release of PHYLIP, version 3.698" which fixes a consensus tree bug. No date is given. We should check to see if we have this version, and if not, upgrade.
- chooseviewer.py should have a way to view Markdown (.md) files. Well... that turns out to be easier said than done. You'd think programs like Evice or LibreOffice Write could do that, but it turns out not to be the case. Actually, it's very hard to make a legible PDF file from Markdown. We can look at this, but there is no simple answer.
- Get BioLegato to recompile with Java11
- Update documentation on adding local components to BioLegato
- Remote execution - It might be almost trivial to add to BioLegato the capability to run jobs on remote servers. We could run the command with sshcc, but set it in an environment variable, so the PCD would look something like
shell "$BL_REMOTE blastp ...."
where $BL_REMOTE would be something like sshcc, or whatever command on your system sends a job to a remote host. This would only work on a clustered system where all hosts share a common file system eg. NFS. If $BL_REMOTE is blank, the command just runs on the local host.
- Automated sequence renaming - Need to be able to rename sequences using some sort of regular expression substitution. SeqKit may be able to do this.
- How hard would it be to revise BioLegato to always use Accession numbers, rather than LOCUS names? Virtually no software uses LOCUS. This is moot except for very old sequenes, since NCBI decided long ago to make LOCUS and Accession identical. However,if you do get an older sequence coming up, it would be good not to have to deal with this problem.
- [Bugzilla 1223 https://www.bioinformatics.org/support/index.php?func=detail_ticket&bug_id=1223&group_id=543] - Edit --> Change case - If you change case, you lose all annotation. After changing case, if you try to use File --> View file, a pseudo-GenBank file is shown that is missing almost all annotation. This seems to be a problem with BioLegato, since there is no "Change case" .blmenu file.
- It might be useful to be able to go from sequences to Neighbors or Links. Two possible ways:
- from bldna or blprotein, export sequences to blncbi based on accession numbers. May need a script something like GenBank2Entrez.Probably just runs Eutils.post.
- Have a script that directly sends output to blncbi, so that you're not only running Eutils.post, but also running elink.
- Time to revisit Genome Browsers. To consider:
- Genome Workbench - We could have an export function from bldna that extracts Accession numbers from entries and then loads them into Genome Workbench.
- UCSD (or is is USC?) Genome Browser
- bltree -> ConfAdd - Add box to paste in bootstrap tree file
- How about email notification for long running jobs?
- Archaeopteryx - Update documentation database to point to current docs, apparently on Google docs as archaeopteryx.js. They seem to be focusing on the web version, and not so much on the standalone application.
- bltree - open trees in text editor for pasting into menus
- All phylogeny scripts (dnadist.py, dnapars.py, dnaml.py etc.) call the main program using Popen, but call later steps such as bootstrapping, consense, and uniqid, using subprocess.call. One some systems, it looks like one of these steps seems to be called before the previous step has been completed, resulting in a No such file or directory message and empty output. You can rerun the program with no changes and then get expected output. Would changing all of these calls to Popen calls be more consistent? I have a feeling we've done this before, so tread carefully to avoid swapping one problem for another. The advantage of Popen is that we can do a p.wait() after every call.
- We need to somehow integrate taxfetch.py so into blnalign, blpalign so that it can get taxonomic information from accession numbers. This is trickier than it might first appear. For example, BioLegato will export sequences using GenBank LOCUS names. It is not clear at what step in the process one would do this. The goal is to be able to generate a phyloxml file for Archaeopteryx to read. Also, it would be nice if we could get alphabetic taxonomy codes like those used by Pfam, as opposed to the numeric taxid numbers from NCBI.
- When doing bootstrapping, the treefile and outfile don't get decoded, so they have the names from uniqid.py, rather than the original names.
- $doc/BIRCH/birchadmin/blastdb/BLASTDB-Considerations.html -Add some stats on search times for various databases on different platforms.
- add NCBI datasets, dataformat - easy command line tools that should complement ncbiquery.py
- add einfo
- rename blncbi to blentrez?
- It looks like Related and Link only work for nucleotide sequences. This needs to work for proteins as well.
- update to latest Hisat
- There was a post on BioStars that indicated that the latest release of rnaspades no longer does error correction on reads. We better look into this, because error checking programs like rcorrector also can eliminate unpaired reads (I'm pretty sure) and at the very least, the tutorials have to be changed to reflect the change in rnaspades.
- Transcriptome Assembly Tools - scripts for cleaning up reads eg. uncorrectible reads, overrepresented sequences etc.
- Update Spades to v 3.14.1
- Transrate is no longer supported by the developer, and has a number of known bugs/issues. Potential alternatives:
- Try fastp as an alternative for trimming reads.
- revise menus as done for blpalign, blnalign
- add support for other file formats
- last-dotplot requires python-pil. It doesn't look like it will be easy to package that in the lib-xxx-xxx/python hierarchy, so it should be documented as a dependency which has to be installed by the user.