[Bioclusters] General question on time consuming problems

George White aa056 at chebucto.ns.ca
Wed Apr 20 14:22:39 EDT 2005


I work in remote sensing.  Typical processing is a pipeline where raw data
from a satellite is processed in chunks to get to some useful product. 
This falls into the embarrasingly parallel category, but new satellites 
have >2x more bits in each channel, >2x more channels, and >2x more
pixels, so the size of the chunks is >8x bigger, which means you are
looking at bigger storage, faster pipes, and more clock time to get
processing done.  10 years ago we worked a 512x512 images was about all we
could handle.  Now we are looking at 4kx8k images, so a couple orders of
magnitude larger.   Our processors have gone from 30 to 3000 mhz, but 
processing time per pixel has only decreased by a factor of 10 as I/O 
becomes the bottleneck.  People are looking at ways to transmit data in
compressed form, but then you need to find ways to quickly get at 
a specific small region (e.g., to look at a time series of observations
where there is ground truth).

The other problem is that many the real-world clusters are lucky to get
50% uptime.  The one down the hall was fried when the A/C died.  They
fixed all that, took a couple weeks to get a new A/C installed, and then a
cable to the RAID stopped working, so now they have to get the cable and
hope the files weren't damaged.  You hear the success stories from people
who have been lucky with A/C hardware, etc., but there are also lots of
cluster owners who are swamped by the upkeep and or poorly maintained 
physical plant (power problems, A/C, etc.).

--
George White <aa056 at chebucto.ns.ca> <gnw3 at acm.org>
189 Parklea Dr., Head of St. Margarets Bay, Nova Scotia  B3Z 2G6



More information about the Bioclusters mailing list