[Bioclusters] Java Vs C++(Qt) for Bioinformatics

Tim Cutts tjrc at sanger.ac.uk
Thu May 24 08:19:57 EDT 2007


On 23 May 2007, at 11:50 pm, Mr. Syed Aijaz wrote:

> Hello All,
>
> Just wondering what bioinformatics community thinks of is best to use:
> 1. Java Swings (1.6+)
> 2. C++ Qt (4.0)
>
> My visualziation tool requires accessing data which is in the order of
> few hundred MBs we are expecting this to hit GBs soon. I am planning
> not to hold up all the data. However, I will have to hold up some data
> (a few hunderds of thousands (O(100,000)) of data entities, each  
> costing
> around ~60 bytes). As the tool is supposed to be a interactive,  
> what will
> be good alternative between Java Vs C++? I am leaning towards Java,
> reason being:
> 1. Comprehensive GUI

I've never done a comparison of Java and Qt in this regard - you  
might well be right.

> 2. Java not that Slow, as they say!

Java afficionados keep saying that.  Doesn't make it true though.   
It's still a lot slower than a compiled language.  Allegedly getting  
better, but still slow.

> 3. Huge API, DBMS, XML, DRMAA, . . . . .

Can't argue with that, although there are of course vast numbers of C  
libraries as well which you can call from C++ programs which have the  
same functionality, so it isn't a cut-and-dried case of Java having  
something C(++) does not.  The C API to mysql is quite easy to use,  
for example, and there's the expat XML library.

> 4. No deployment pain, although a little application
>   specific deployment may be required example: preference files etc

This is true in theory, and was always Java's great promise, but  
there are of course all those niggling little differences between JVM  
implementations.

> 5. Automated Garbage collection, less trouble in maintaining memory.
>   Although it has a little overhead, it can be reduced by efficient
> handling of data???

And would have to be.  The downside of Java's convenient memory model  
is partly that it uses a lot more memory, and also that because you  
have no control over memory layout, you're likely to have many more  
cache misses than with well-written compiled code.  This will hurt  
performance.  This problem will, in my view, get worse as we move to  
more massively multi-core processors, and all those cache misses on  
the many cores will be competing with each other for access to main  
memory.

> 6. efficient multi threading, not system level fork, etc??????

I'm not sure how efficient it actually is.  My experience with  
watching multithreaded Java applications run led me to observe that  
the program wastes an inordinate amount of time just deciding which  
thread to run next, and not actually doing anything useful.  How much  
of this was down to programmer error in their implementation though,  
I don't know.  Probably quite a lot.  A brief glance at the code in  
question seemed to show a surfeit of synchronisation requests, most  
of which weren't necessary.

Most operating systems have light weight threads, compared to fork(),  
and those will usually be what the JVM is using anyway, so a C++  
application using pthreads should be as efficient, if not more, than  
the JVM running on the same architecture.

> 7. Java has growing number of Bioinformatics applications

Just because everyone else is doing it doesn't make it right!  It's  
just trendy.  Java has its place.  GUIs are one thing it is quite  
good at.  But I don't think it's a good choice for handling large  
data sets, because of its memory inefficiency, and at very CPU- 
intensive code, because of its speed.

Of course, one thing you might consider is writing the actual data  
processing parts of your classes as Java native methods, written in  
C.  They should then be fast, but at the expense of making the  
program harder to deploy.  Keep Java for what it is good at (GUIs,  
database connectivity, etc) and use a lower level language for the  
actual data processing, I say.

Personally, my favourite GUI developing framework currently is Cocoa,  
but of course it isn't cross platform, so is a bit of a non-starter  
for this discussion.

Anyway, that's my 2¢, from an admittedly somewhat anti-Java biassed  
standpoint.  I don't actually hate Java as much as it sounds, I just  
hate seeing it used as a swiss-army knife for any job (in much the  
same way that perl was abused 5-10 years ago)

Tim



More information about the Bioclusters mailing list