BIRCH/Release To Do list

From Bioinformatics.Org Wiki

Jump to: navigation, search

Mystica Arrow set (with deep) 1.png [return to BIRCH Project]


Contents

Future Releases

Installation and Updating

getbirch


These changes can be put into operation without waiting for a new release, since they don't affect the stable release files.
The question is, how?

Test whether jkd is headless

dpkg -l | grep openjdk
ii  openjdk-8-jre-headless:amd64          8u171-b11-0ubuntu0.18.04.1            amd64        OpenJDK Java runtime, using Hotspot JIT (headless)

Solution: Document on GetBirch site that JDK MUST be full JDK, and not headless. If getbirch.jar won't run, install full JDK

Debian/Ubuntu:

sudo apt-get remove openjdk-8-jre-headless
sudo apt-get install openjdk-8-jre

Updating


Desktop setup

local files

no longer supported, comparable to NewLocalFiles.list

Dependencies for 3rd party programs

First, we need to begin a list of programs that have special dependencies:

program dependency
Weblogo numpy
Weblogo ghostscript (for formats other than eps)

It's probably a good idea to put this into birchdb, once the necessary fields for the table become stable.

Next, we need a way to detect each dependency during an install or update, and to somehow inform the user of the dependency.

Ideally, there would be a way to install the dependency during install or update, if the user is the BIRCH administrator. However, that would be a lot of work, and may not be stable. There could also be version dependencies, such as needing Python 3, or Java 8 or greater.

BIRCH

More modern looking web site

Requirements

Strategy

Home Page Layout

We want a lot of links on the front page to make it easy to find things in a few clicks. You don't want to make the user explore a bunch of hierarchical menus.

Settings for each User

We need to have a directory within the $HOME directory where settings for each user can be stored.

Contents

Since we set these things anyway, maybe its best to implement this as a bash source file. This would at the same time be executable, as well as readable, since bash variables are in the form VAR=VALUE

</tr>

Variables in $HOME/.config/BIRCH/BIRCHsettings.source
variable name description
BL_EMAIL email address for notifications
BIRCH_PROMPT Y or N - tells whether to use the BIRCH command line prompt. Y overrides prompt set in .bashrc file.
BL_TextEditor text editor for BioLegato
BL_PDFViewer PDF viewer for BioLegato
BL_PSViewer PostScript viewer for BioLegato
BL_ImageViewer bitmap image viewer for BioLegato
BL_Document Word Processor for BioLegato
BL_Spreadsheet Spreadsheet for BioLegato
BL_Browser Web browser for BioLegato
BL_Terminal Terminal program for BioLegato
NCBI_ENTREZ_KEY API Key for Eutils


Where do we put it?

</tr>

OS/desktop directory
RHEL7/Xfce $HOME/.config
RHEL7/GNOME $HOME/.config
fedora31/GNOME $HOME/.config
Ubuntu 18/MATE $HOME/.config
Mac OSX $HOME/.config

https://specifications.freedesktop.org/basedir-spec/latest/ar01s03.html defines environment variables that specify where files for applications are stored. Config files are usually in a directory within $HOME/.config.

OSX - https://eclecticlight.co/2019/08/28/preference-settings-where-to-find-them-in-mojave/ tells us that the place for preferences on the OSX desktop is ~/Library/Preferences. This directory contains a .plist file for each Mac application. The plist file is a binary. My feeling is that since BIRCH will be using a textfile for settings, we avoid this directory, especially because BIRCH is not an app, but in many respects more like an operating system.

Implementation

Strategy

1. Create a script that creates this directory if necessary, and populates it with settings
2. Get newuser to run it.
3. Get BioLegato to run it

Python

There should be a common Python directory for installed Python packages. We already have $BIRCH/python and should add $BIRCH/local/python. Added $BIRCH/local-generic/python. Forget the idea of a common Python directory. Because many Python modules such as numpy require compilation of C code, libraries cannot be assumed to be portable across platforms. We will phase out use of $BIRCH/python and $BIRCH/local/python in favor of lib-linux-x86_64/python and lib-osx-x86_64/python.

Python libraries

Python3

Is there a way, on a script by script basis, to force use of Python3? That way, as we progress, we can focus on developing for Python3, and do 2to3 conversions that are not backward compatible with Python2.

By now, we can count on Python3 being available, but not necessarily being the system default. We need to explicitly call Python3 in those cases where Python3 is required.

Ubuntu 22 There is no 'python' command on Ubuntu22. There is a 'python3' command in the default install, but you have to explicitly say 'python3'. If you want python2, you install the python2 package and type 'python2'. You can get 'python' to give you python3 if you install the python-is-python3 deb package.

Two possibilities:


Machine Python3 version
flamingo 3.6.9
brassica 3.5
triticum 3.6.9
peacock 3.7.2
CCL 3.6.8
maui 3.6.9
fedora31 3.7
wotan 3.6.8


Comptability issues


BIRCH Python compatibility

admin

  • newuser.py
  • nobirch.py

install-scripts

Python2&3 compliant:

  • birchhome.py
  • UNINSTALL-birch.py
  • Update_birch.py
  • update_local.py

Not yet compliant:

  • birch_install.py - needs urllib.request (try fixing with six)
  • setplatform.py - needs urllib.request (try fixing with six)
  • test.py - imports urllib but doesn't directly call it. Do we need this declaration?

scripts

Python2&3 compliant:

3rd Party Python3 compatiblilty

How to set PYTHONPATH

The pip command installs Python packages from repositiores. By default, they are installed system-wide, but we want to install them in $BIRCH/python. For example, to install the package gffutils, we type

pip3 install --install-option="--prefix=$birch/lib-$BIRCH_PLATFORM/python" gffutils

All packages installed in this manner will be in $BIRCH/lib-$BIRCH_PLATFORM/python.

We would have to add the PYTHONPATH environment variable to profile.source, cshrc.source etc.

PYTHONPATH=$birch/lib-$BIRCH_PLATFORM/python/lib/python3.5/site-packages
export PYTHONPATH

To install a package in $BIRCH/local

pip3 install --install-option="--prefix=$birch/local/lib-$BIRCH_PLATFORM/python" gffutils

Platform-dependent Python
In some Python packages (eg. cutadapt), platform-specific libraries (eg. C, C++) are part of the package, usually as .o files. These can be buried several directories down in the package, but they are there.

For such cases, we install in platform-specific python directories:

Linux-x86_64

pip3 install --install-option="--prefix=$birch/lib-linux-x86_64/python"

Mac-OSX

pip3 install --install-option="--prefix=$birch/lib-osx-x86_64/python"

Setting PYTHONPATH then becomes

PYTHONPATH=$birch/lib-$BIRCH_PLATFORM/python/lib/python3.5/site-packages
export PYTHONPATH

Portability

One of the big drawbacks with Python is that packages are expected to be installed in the root hierarchy with root privileges. pip3 often fails when trying to install in local directories. Worse, it seems that each package must be wedded to a particular version of Python eg. 3.5, 3.6, 3.7...

These problems are even true within a platform eg. from one Linux distro to another.

Potential solutions:

python4 not in the cards

There will be no python 4. Hooray!
https://www.techrepublic.com/article/programming-languages-why-python-4-0-will-probably-never-arrive-according-to-its-creator/

Java

Log4j2 vulnerability At present there are no known vulnerabilities with the Java programs distributed as part of BIRCH. To a large extent, this is due to the intrinsic design of BIRCH. BIRCH does not run web servers or peer-to-peer functions. All applications run with end-user privileges only. Some applications do post requests to web services, either through URL queries or a Java API. Some applications do logging using log4j, but output is written to local files owned by the user.

Out of an abundance of caution, we will recompile existing applications where possible, or obtain updated jar files from the authors. Scans will be done that can detect applications using log4j2, including jar files (which are really just zipped archives, so they can easily be scanned.)

Progress on applications:

Package Comments Status
Archaeopteryx
ArrayNorm
artemis uses log4j2; Need to get update from EBI
axis2
BioLegato needs to be recompiled
birchutils doesn't use log4j2 OK
blrevcomp doesn't use log4j2 OK
Blastviewer
BRIG-0.95-dist
Cytoscape will be upgraded when a patched version is released
eutils
FastQC
genographer
getbirch needs to be recompiled
Jalview upgraded to version 2.11.1.5-j1.8 OK
mauve
Mesquite
MWCalculator
readseq
shuffle doesn't use log4j2 OK
TM4
Trimmomatic
Trinity has some Java tools; need to check with Trinity about updates, if any

Gnu Parallel

We should investigate ways in which GNU Parallel can be used to speed up programs. This looks like an amazing and versatile program that has a lot of ways to speed up serial code without changing the programs themselves. There is a good discussion of BioStars.

Libraries

Delete old libraries

especially those associated with GDE. The best way is to rename a library using the .old extension. The libraries to try are:

Testing: The main programs of concern are acedb and treetool.

OSX Dependencies

Documentation

MacOSX M1 (aka Apple Silicon) support

At present (June 2022) Apple is switching its lower end devices such as MacBook and Mini to the ARM SoC (system on a chip) architecture. The current high end Mac Pro is still x86-64. There is a lot of discussion on the Internet that Apple is designing RISC chip for servers, so the Mac Pro will probably switch to that chip at some point. There is speculation that Apple will try to get back into the server market, and the selling point would be lower power/heat, and in many cases better performance.

For the immediate future, the best and most economical strategy is to buy a Mac Mini ARM and support that platform. BIRCH users who want to use the MacOS will probably also use low-end machines. For higher end work, people will use Linux anyway. My guess is that bioinformatics on high-end Macs will be a very small market share for the next 2 or 3 years.

Maybe the other thing to consider is that the Apple ARM processors are still undergoing some evolution, so you don't want to spend a lot of money now on high end hardware.

Finally, it's not just a question of getting the Mac's out, but the applications (eg. Java, LibreOffice etc.) have to adapt and get polished on that platform. THEN we have another lag period as the bioinformatics applications catch up.

BONUS: Once we get a MacOS-ARM64 machine, we can use the same machine to produce a Linux-Arm version of BIRCH using Linux VMs.

binaries

How do we do binaries? For some time, there will be binaries that are easy to re-compile on arm64, and others that may never recompile, or for which we have to wait for developers to create binaries.

While it is not obvious how to deal with this problem, one way is to download two sets of binaries on macos-arm64: bin-macos-arm64 and bin-osx-x86_64. If both are in the path, and bin-macos-arm is first in the path, then if the arm version is available it will run, but it it's not found there, the shell will use the x86_64 binaries under rosetta2. This lets us create a release soon, and over time eliminate more and more of the legacy x86_64 code. We would have to modify the install scripts to work with both sets of binaries on arm64 systems.

From this discussion, it is likely that we will have to make rosetta2 a requirement for birch on macos-arm64 platforms.

BioLegato

Need mechanism for BioLegato to run commands in the background

At present, there is no way for PCD shell commands to run jobs in the background. That is, the Java Virtual Machine cannot terminate until every shell command has terminated. Even if the command ends with an ampersand, it must terminate before the JVM will terminate. That is an annoyance when we want displayed output to persist even after a BioLegato job has terminated, and a potentially major problem if we want to launch long-running or resource-intensive jobs from BioLegato.

It's probably best to write a short demo program to experiment with different approaches.

Hints:

Links:

Solution: In BioLegato 1.0.3, CommandThread.java has been modified so that if a command line ends in '&', it will be run in the background.

Remaining issues:

Development

GetInfo - Colourmask: new colours don't display

Bugzilla #1201

Hints:

The Update action is contained in the SequenceWindow. My guess is that we need to pass the SequenceTextArea to the SequenceWindow so that it can call the repaint function for SequenceTextArea. It is worthy of note that there are numerous calls to repaint in SequenceTextArea that specify the area to repaint. This may be for efficiency during actions like select and scroll, and may not be necessary here.

system command appears to have no effect

Bugzilla #1204

Hints:

get rid of wrappers for text editors

The BioLegato scripts call choose_edit_wrapper.sh, which in turn calls either nedit_wrapper.sh for nedit or gedit_wrapper.sh for gedit.

Output to console

We need to decide on a standard way to run programs so that we see the progress as the program runs. Currently this is done using the command stored in $GDE_TERM, but that is not necessarily platform independent. Some possibilities include:

Table Canvas

chooseviewer.py

blsort.py

BLHelper.py

birchadmin

birchadmin is a birch system administration tool.

birchdb

The problem is that failure of birchdb to launch Xace or tace has been inconsistent. It works on some days, and not on others. It is as if something keeps getting set or unset.

Although error messages aren't consistent, here's one (on jupiter):

Gtk-WARNING **: Failed to load module "libgail.so": libgail.so: cannot open shared object file: No such file or directory
Gtk-WARNING **: Failed to load module "libatk-bridge.so": libatk-bridge.so: cannot open shared object file: No such file or directory

Other times, this script gives a Segmentation Fault error. Once again, the only place I've had this trouble is on CCL.

As well, there is a GUI front end called [ RazorSQL] which may be all that we need to manage birchdb.

Quick and dirty patch/addon mechanism

We need a way to apply patches to an existing BIRCH install. This should be a very simple mechanism to start with, which will also teach us some things about exactly what it is that we want it to do. Initially, it should probably be nothing more than running a script that downloads a file and untars it, so that the files just go where they are supposed to go with permissions already set.

We need a mechanism to record in $BIRCH/local which addons are installed. This way, when a BIRCH update is installed, we can make sure to re-install any addons.

Definition of an add-on

An add-on includes:

xxxxx.addon.d
    payload.tar
    install.py
    addon_spec.csv      

An add-on can either be something new that is installed, or a patch that overwrites existing files, or even a script that runs and changes something. For example, a patch might be as simple as a script that changes important permissions, or changes the name of a file, or does a string substitution to correct an error.

Algorithm

get list of available addons/patches
user selects one or more
foreach addon selected
    cd $BIRCH
    download addon
    gunzip addon.tar.gz
    tar xvfp addon.tar
    cd xxxx.addon.d
    mv payload.tar $BIRCH
    cd $BIRCH
    tar xvfp payload.tar
    cd xxxxx.addon.d
    python install.py
    cat addon_spec.csv >> $BIRCH/local/admin/addons.csv

FSAP, XYLEM

Convert FSAP and XYLEM to Free Pascal?

Free pascal appears to still be supported. They have builds for Linux and MacOS, both on x86_64 and ARM64. There are even RPM and DEB packages.

GNU Pascal looks like it hasn't been supported since 2005. GNU Pascal has a great deal features aside from the Jensen & Wirth standard, including support for most Borland features, and even abstract object types and methods. The main improvement would be that we could leave behind p2c. This should be done with great care and a lot of testing, because there could be surprises hiding in the implementation. See http://www.gnu-pascal.de/gpc/h-index.html

BLAST+

In Python:

import multiprocessing
multiprocessing.cpu_count()
Algorithm:
if BLASTDB not set
   prompt for directory (default $BIRCH/GenBank)
read list of database divisions currently installed
read list of database divisions to be installed
uninstall those not in the list from previous step
install all divisions in the install list

Could do this as:

    • shell script with BioLegato front end
    • Python script with BioLegato front end. This could be implemented by adapting BLHelper.py. The menu layout would look something like:
Nucleotide (nt) Installed Install O</d> Delete O</d>
Protein (nr) Installed Install O</d> Delete O</d>
RefSeq RNA (refseq_rna) Not installed Install O</d> Delete O</d>
    • Java application

Blast output viewers

Phylogeny

bldna

blfeatures

How about a BioLegato that displays a GenBank features table. It would be an output option from the Features program. blfeatures would use the table canvas to display feature information:

Accession    FeatureKey    Location    Qualifiers....

You could do the usual scan/sort/extract operations to get a narrowed-down list of features. Then retrieve the features you want from the GenBank files.
This might be far more useful than one might originally think.

blprotein

blncbi

bltable

blreads

blpandas

The Pandas API seems ideally suited for a BioLegato front end. The data paradigm seems to be the data frame (df). Pandas does an operation on data in a data frame, and the output is another data frame. Sound familiar? See http://pandas.pydata.org

Here's how to do this:

  1. Break out BioLegato as a standalone project, perhaps in a Git repo.
  2. Create a demo blpandas
  3. Advertise blpandas on the Pandas Stack Overflow forum. Solicit collaborators from the Pandas community.

Multiple Alignment

High throughput multiple alignment programs

MAFFT

Grishin Lab Software

The Grishin Lab at HHMI has a lot of publications and tools related to protein evolution, structure and multiple alignment. The Grishin scoring matrix is one of the ones used in NCBI BLAST. See http://prodata.swmed.edu/Lab/Software.htm

MstatX

Calculates statistics for multiple sequence alignments. Output includes various scores for multiple alignment. This should be a good way for comparing the quality of alignments based on different methods or parameters.

https://github.com/gcollet/MstatX

GUIDANCE2

GUIDANCE2 seems to be a comprehensive package for evaluating multiple alignments. It is more polished than mstatx, and gives some pretty good output. The downside is it requires BioPerl and BioRuby modules, which may be an annoyance to install.

TCOFFEE

Replace TCOFFEE!!!

On MacOSX, t_coffee is v8.14. It has not been possible so far to get later versions to run on albacore. It was possible to compile the generic version but that also generates errors. It is not certain whether this is a problem with albacore specificially, or MacOSX in general. I ONCE installed TCOFFEE in an account on OSX, and the binary didn't work. Nonetheless, I was unable to run any previous version of TCOFFEE. Even after removing all of the TCOFFEE environment variables from all .rc files, and from the .MacOSX directory, every time I tried to run a 8.14, it would create a new ~/tcoffee directory with the new version in it! This thing is like a virus. You just can't get rid of it. Somewhere in this account, there is a tcoffee script or settings lurking in a file.

Fortunately, the problem is limited to a single account.

There is now a Clustal Omega, which the authors claim is "The last alignment program you'll ever need". Maybe.

blnalign, blpalign

Multiple Alignment Tutorial

blnfetch, blpfetch

bltree

blmarker

blcont

BioLegato for continuous data. This would be an implementation of bltable, targeted at data expressed in real numbers, such as phenotypic data. We would start out with the appropriate programs from Phylip:

mauve

DONE The latest release of mauve is from 2015. This works fine with Java8, but will not work with Java11.

Short term fix: Run under JDK8

https://edwards.sdsu.edu/research/running-mauve-with-java-10/

Modify mauve script to look for Java8. If found, use it. If not found, pop up a message saying to install Java8.

primer3

mrtrans

DONE
It may be time to replace mrtrans with something better

  • mrtrans only uses universal genetic code
  • mrtrans has bizarre problems with both input and output

Possible replacements:

</strike>

Basic Genomics Tools

It should be possible to identify a set of basic genomics tools that are used by common 3rd party packages.

Examples:

BIRCHv3.90 (Current Development Version, UNSTABLE)

BIRCHv3.87 (Current Production Version,STABLE)

BIRCHv3.86

BIRCHv3.85

BIRCHv3.80

BIRCHv3.71

BIRCHv3.70

BIRCHv3.60

BIRCHv3.50

BIRCHv3.40

BIRCHv3.30

BIRCHv3.20

BIRCHv3.15

BIRCHv3.10

BIRCHv3.00

BIRCHv2.9

BIRCHv2.8

Personal tools
Namespaces
Variants
Actions
wiki navigation
Toolbox