[Bioclusters] Java Vs C++(Qt) for Bioinformatics
Aaron Darling
darling at cs.wisc.edu
Thu May 24 12:21:54 EDT 2007
Hello Aijaz,
Mr. Syed Aijaz wrote:
> Hello All,
>
> Just wondering what bioinformatics community thinks of is best to use:
> 1. Java Swings (1.6+)
> 2. C++ Qt (4.0)
>
> My visualziation tool requires accessing data which is in the order of
> few hundred MBs we are expecting this to hit GBs soon. I am planning
> not to hold up all the data. However, I will have to hold up some data
> (a few hunderds of thousands (O(100,000)) of data entities, each costing
> around ~60 bytes). As the tool is supposed to be a interactive, what will
> be good alternative between Java Vs C++? I am leaning towards Java,
> reason being:
> 1. Comprehensive GUI
> 2. Java not that Slow, as they say!
> 3. Huge API, DBMS, XML, DRMAA, . . . . .
> 4. No deployment pain, although a little application
> specific deployment may be required example: preference files etc
> 5. Automated Garbage collection, less trouble in maintaining memory.
> Although it has a little overhead, it can be reduced by efficient
> handling of data???
> 6. efficient multi threading, not system level fork, etc??????
> 7. Java has growing number of Bioinformatics applications
>
Having implemented several c++ programs and a bioinformatics data viz
tool in Java (Mauve), I would agree with your logic behind favoring
Java. In my experience, it's not necessarily Java itself that's slow,
it's often how Java is used that makes it slow.
Garbage collection and object allocation are slow, so if you have to
repeatedly allocate objects for each of your 60-byte data entities then
performance will likely suffer. Often that can be avoided by
pre-allocating all necessary storage to avoid runtime penalties during
interactive usage. The other issue with allocating huge numbers of
small objects is the tremendous memory overhead required to track the
objects. This isn't really a Java-specific problem, I think it exists
any high-level language. In any case, if your collection of 60-byte
data objects can be somehow flattened into arrays of an integral type
like int or long, much of the memory overhead can be avoided, and if the
data is accessed in linear scans, cache performance may improve as
well. For huge data sets memory-mapped I/O may be helpful, depending on
the I/O pattern.
When working with Swing GUIs, be sure to test your code on each platform
since the widgets appear slightly differently.
-Aaron
More information about the Bioclusters
mailing list