Hello, New to the list; my two cents. One thing that I've ended up stitching together is a system for managing the download, conversion, management, and storage of biological-related data sources. For example, we've all used bioperl (or it's kin) to grab and parse some NCBI, SwissProt and GFF data (perhaps even from DAS), only to realize... "To make this system 'effective', I will need to more fully incorporate all the data resources locally." Depends on the end-user experience, but sometimes you need to store GenBank, LocusLink, EnsEMBL, GO, and a slew of other sources locally. (We use Oracle in-house, so a standard MySQL to Oracle conversion would be helpful as well!) Here's where a dedicated system--instead of myriad scripts--would enable more effective ends to research means. As far as industry, Lion Bioscience has created a tool called Prisma that attempts to address this problem, I believe, but it's not OSS and/nor accessible to enough folks. I know we've all stitched up systems--programmers are most oft writing the crucial "glue" tying resources together--but perhaps a "supported system" would help us all deal with the data wrangling. There are many efforts to enable information access, but many are a little too web-service based. When you have everything from simple FASTA to ASN.1 (and all the XML in between) to deal with, a common DRM/DBMS-related tool--sufficiently DRM/DBMS agnostic--would go along way, IMHO. I've seen requests concerning downloads, parses, database loads--of course all as cron scripts--on several lists. Thanks for any input on these issues; apologies if I'm off-topic, Joe -- AGY Therapeutics, Inc. 290 Utah Ave South San Francisco V: (650) 228-1146 F: (650) 228-1180 -----Original Message----- From: bioclusters-admin@bioinformatics.org [mailto:bioclusters-admin@bioinformatics.org] On Behalf Of Chris Dwan (CCGB) Sent: Saturday, March 27, 2004 8:44 AM To: 'bioclusters@bioinformatics.org' Subject: RE: Pull-based job scheduling (was: [Bioclusters] Best ways to ta ckle migration to dedicated cluster/Farm) > I have also successfully implemented a RDBMS-based pull system that worked > very well. Having written one myself, I have respect for all the home grown workload management tools out there. I've seen them vary from combinations of cron and "at", with shared files for job allocation, all the way up to the RDBMS / thin client solutions. Mine was an rsh based push scheduler implemented mostly in Tcl using flatfiles for state. It kept 5 quad processor P-II's busy pretty much full time running BLASTs for about three years. Here's a question for those who have created homemade workload managers: Would you do it again, today? Why or why not? Personally, I would try every other avenue before writing another scheduler. Home grown systems tend to make the developer a critical resource and a single point of failure. It's sort of like implementing your very own database management system. Maybe fun for the developer, but bad and wasteful for the organization. You can get all the power you want out of commercial / and open source solutions for DRM and DBMS problems. That said, there still are problems where home-grown is still the best way to go. In my opinion, one of them is stitching together compute resources across organizational and administrative boundaries. For which other common tasks do people think it's still cost / time effective to build homebrew solutions? What homegrown software do you rely on, but dream of replacing with someone else's supported code? -Chris Dwan _______________________________________________ Bioclusters maillist - Bioclusters@bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters