BIRCH/Meeting/Nov172011
From Bioinformatics.Org Wiki
Contents |
Overview
This meeting will be to discuss winbirch, and to demonstrate progress on git-kit
Status
Gitkit is now up and running, tested on several platforms. You can get it from github
I have created a birch-build repository as well, which contains premake binaries for every platform.
I have started a birch-native-src repository, which I am currently building fasta and phylip in. Fasta looks like it won't be too bad, just have to untangle the mess of included make files. Maybe we can send the developer our make scripts - he might like them!
Update: Phylip is building and going well. Fasta is coming coming along nicely. I am trying to contact the original developers to see if they would be interested in using my build system.
I have come up with a few ideas:
- Use mingw to target windows, so that we can build by cross compilation (build windows binaries on linux!)
- Build and link to static libraries wherever possible, so that they are just embedded in to the executables. This will eliminate shared object hell. We will need to make a birch-lib repository to house these.
Git-kit
Gitkit is available at github. To use git-kit simply clone it with:
git clone git@github.com:daleha/git-kit.git
Then, edit "gitkit.cfg" to specify which repos you would like to sync. I am currently working on a "git-kit setup" command that will make this simpler, but for now this must be done manually.
I would like to setup the birchdev on psgendb with gitkit, so that you can start to use git-kit to sync with me.
Git-kit is self-hosted (git-kit is used to sync git-kit - this has been invaluable in developing git-kit, as it can simply test new features on itself), so git-kit can by default be used to update itself (and in fact, does, with the "git-kit update" command).
As I am the only one who has been using/developing git-kit, I am certain that there are still some bugs. All output written by git-kit is recorded in a ".gitkit.log" file, so if there are any bugs you can simply email this file to me, and then run a "git-kit update" once it is fixed.
I have spent a fair amount of time on submodule support for git-kit. I would like to get graham to use git-kit as soon as possible for biolegato, to make it a module of birchdev, and also add getbirch, the framework folders, and the binaries as modules of birchdev.
Currently, gitkit uses github, but any hosting site can be used - we can even sync with the UofM's SVN repositories (git and svn can inter-operate), if we really want to keep it in-house. I think, however, that the best strategy is to use github for now. We can also mirror to bio-informatics.org's svn repositories if we so choose.
Action items
- find out what the competitors are doing that is similar to Git-kit
- When Git-kit has matured, we probably want some sort of web site. Should be reader-friendly, not just a page on a Git-repo
- Think about where we might publish a short paper on Git-kit (eg. Linux Journal or similar)
- Let's write an outline of what such a paper would look like.
BIRCHDEV/Git
Winbirch To do:
Binaries
Reality Check
- We try to minimize use of binaries by using Java or Python programs, but we are still stuck with some packages that we have to include, that are writtn in C/C++:
- NCBI programs - Sequin, Cn3D, netblast(blastcl3)
- FSAP (Fristensky Pascal/C code)
- Fasta - currently, fasta source code is compiled within install-birch. Why, do you ask? Because the U of Virginia "license" doesn't explicitly allow redistribution of the programs. So the original solution was, as part of the BIRCH install process, to have scripts that downloaded and compiled FASTA. However, lots of places redistribute FASTA so this is really a moot point. Now, I just distribute the fasta binaries that I compile and nobody has complained yet. So I will move FASTA into BIRCHDEV/install.
- Blast
- Phylip
- sometimes, source code is so old that it can't be recompiled
- mapmaker
- ACeDB
- almost always, source or Makefile will have to be manually tweaked, which may take some detective work each time you install.
- Can't always count on system libraries to be present, so we have to find out which ones need to go along and put them into lib-xxx-xxx.
- EMBOSS - during compilation, their Makefiles hard-code paths into the binaries. This means that there is no way to make binaries that will run anywhere else except on your system in a specific directory.
- Current model for how we build binaries:
- download tar archive to BIRCHDEV/install and recreate directory tree
- Compile source for each platform (Ideally, this could all be done in BIRCHDEV/install, but at present we can only do it for the platforms that IST supports: solaris-sparc, solaris-amd64, linux-x84_64. Others such as osx-x86_64 and linux-intel have to be built on different machines. DALE SAYS: try 'linux32' to see if you can comile 32-bit linux on gaia/moon etc. THANKS DALE!
- Create a subdirectory called bin-xxx-xxx to put the compiled binaries into.
- Test. This can sometimes be done within the install directory, but in other cases, it may be easier to install binaries in $birch/local and test there.
- Copy tested binaries into BIRCHDEV/bin-xxx-xxx
- Run BIRCHDEV/build/makebin.csh to create new Development version of binaries.
I tend to keep around the previous version of a directory for reference for the next time I build a newer version. Usually, there will be a README file with notes telling what bizarre things I had to do to get it to compile.
Possible solutions
Currently, winbirch needs only to have the binaries compiled and bundled in to the getbirch installer, as getbirch is currently already able to install the birch framework and run biolegato. The main reason why this has not been straight-forward is that many binaries require the cygwin dll to run or be compiled. *We also need to take care to ensure that the same version of each binary is used on each platform*. See further details at BIRCH/Winbirch/Binaries.
Additionally, this would mean creating another platform based fork of the BIRCH binaries, which is more code to maintain and build.
Rather than separately maintaining the binaries for each platform, I recommend that we use a technology called premake to define the build configurations for each platform for each of the sources in the binary builds (starting of course with the windows configuration), and from there we will have a means of **automatically** generating the build configurations for each platform (targeting to xcode for OSX, gmake for linux/solaris, and microsoft visual studio for windows). This would allow us to maintain a very small, simple set of scripts that will build all of the birch binaries from source. For binaries that we are unable to obtain source code for, we will of course simply have to store a version in our repositories.
I can stand behind premake, as I have played a role in developing it and have even landed some patches in the source tree! Hopefully, once we have birch cleaned up and widely used on various open-source sites and communities, other developers will do the same for it.
In order to get started, I will need to find out (from Brian) how the binaries have been built/obtained for OS-X, linux, and solaris (where the binaries and or sources have been obtained). Then, I propose making a "common-src" repository, as well one for each platform dependent sources and or binaries. This will ensure that all ports of birch (windows, solaris, os-x, and linux) are using the *same* version. Finally, I will need access to the biolegato source code, to make appropriate winbirch related patches (for cygwin integration).
Action item (Dale): Proof of concept using Phylip and FASTA packages.
Shell
Currently: requirs CyWin.
- may not be well-supported
- adds additional dependencies on Windows
- performance hit
Possible solutions:
1. Java shell (Graham)
2. Python daemon (Dale)
- low-level command processor
- each job connects to daemon through sockets
The two ideas may have some synergy. For example, we have also talked about BIRCH/BioLegato running as an applet, or as a client/server application talking to a remote server.
What to we need to do to have a working WinBIRCH?
Major problems:
- biolegato that is Windows-aware. eg. "If Windows then...". Probably mostly with respect to how shell commands are sent to the CygWin shell.
- biolegato commands are sometimes longer than Windows (DOS) shell.
- Have to be able to install CygWin, need admin permissions.
- compile binaries
- Windows launchers
- update all install scripts with Windows in the appropriate case/if statements etc.
- Need to update some scripts that have platform-specific case/if statements
- Update config files (mostly in $birch/admin) to handle Windows
Action items:
- Need real hardware machines that run XP and Windows7
- Git sharing
- Dale needs permissions for 'birchdev' group. Also a cron job that keeps group membership to birchdev and group rwx permissions.
- Brian - copy FASTA, FSAP and XYLEM into BIRCHDEV/install.
Compromises
- No ACeDB
- require Administrator priviliges to install CygWin, and maybe for other parts of the install process
Long term To do's:
Birch command server: The framework used to create git-kit can be recycled, and used to create a birch-server that biolegato can communicate with (probably through sockets, sending commands, and receiving output). Biolegato can then launch an appropriate 3rd party application (such as a viewer for pdfs, a web browser, etc), to handle this output if biolegato does not support it. I suggest that all configuration options be stored in JSON format, so that biolegato can easily read/write it.
This would eliminate the need for cygwin, and would be a huge step forward towards eliminating platform independent quirks.
Also, I agree with graham that we should use javascript to generate the documentation dynamically. This would cut out the need for acedb to generate documentation, and I think it is the cleanest solution to providing dynamic client-side content.
Assessment of priorities
When we last met, you said that the highest priority was "to get git working". I think that git-kit is the answer to that, and once it has been field-tested we will *finally* be able to close the book on all of this version control infrastructure building.
I think that the winbirch binaries are now the obvious highest priority, so that we can finally roll out a release of winbirch. Is there anything further that needs to be done with git, or should I focus my attention entirely on winbirch now?