I ran SAS on an IBM SP2 a few years ago (basically a cluster with a proprietary high-speed interconnect). Ran it against DB2(AIX) in a "data parallel" mode for data mining apps. Three phases: 1 data characterization 2 model generation 3 model application (i.e. applying a scoring algorithm to db rows) 1 and 3 can be done in a "data parallel" fashion -- with processors pulling partitions of the overall database and processing them independently. This married well with the shared nothing DB2 database architecture. 2 typically required putting together one comprehensive dataset and running on a smallish SMP (4- or 8-way). You can pull datasets and cat them together, try to use log-combining approaches to do this more rapidly, and/or try to find a very fast file system that allows you to do "concurrent write" from multiple clients into a single file (we happen to have one at Panasas; there are some other efforts in this area as well). Besides the licensing issues (you need to license SAS on every node), the biggest challenges were around data partitioning and subsetting strategies. If you're running against a parallel database engine, do as much processing as you can in SQL before pulling the data out with SAS/CONNECT. You'll also want to try and exploit the scale-out / data parallel architecture, which may mean heavier hardware or innovative approaches to model generation if you hope to accelerate that phase. There was some research on parallel/distributed model generation emerging when I was looking at this a few years ago. Bruce Bruce Moxon Chief Solutions Architect, Panasas Inc. Delivering the premier storage system for scalable Linux clusters www.panasas.com bmoxon@panasas.com