Links to HEAD: | (view) (annotate) |
Sticky Revision: | |
Sort logs by: |
Now split of data also working with text-based keys as well as numerical MySQLConnection: - method getAllIds4KeyAndTable now splitted into two methods one for numerical ids and another for text ids - new methods getColumnType and isKeyNumerical DataDistribution: - method getIdSetsFromNodes splitted into two one for numerical ids one for text ids DataDistributer: - new methods: splitIdsIntoSets now splitted into two methods one numerical, one text - change methods: splitTableToCluster, splitTable, insertIdsToKeyMaster, removePK, addPK, createNewKeyMasterTbl, removeZeros, loadSplitData, dumpSplitData to make them work for both text and numeric keys. Introduced generic type T in some of them - some bugs corrected: -- an important one in createNewKeyMasterTbl, was introducing record in dbs_keys with srcDb instead of destDb as it should have been -- some bugs in loadSplitData and dumpSplitData to account for cases in which there are less ids than number of nodes and thus some nodes don't get any data. Wasn't counting with this before.
Added method to setDumpDir method
Improved considerably the splitTableToCluster method: - got rid of the unnecessary step of creating partial tables before dumping. - now directly dumping with new method dumpSplitData, a modified dumpData that dumps using a WHERE condition - added variable NUM_CONCURRENT_SAMEHOST_WRITE_QUERIES used in dumpSplitData method. It sets the concurrency when dumping locally only from the master
Added PARALLELISM in load/dump of tables using new class QueryThread (extends Thread) Modified methods loadData, dumpData and loadSplitData to dump/load parallely in cases that is useful by using the QueryThread class. New method initializeDirs(String[]) to do some of the dir initialization that was in dumpData Got rid of one of the getConnectionToNode method, not needed anymore New important 2 final static variables: NUM_CONCURRENT_READ_QUERIES and NUM_CONCURRENT_WRITE_QUERIES. They define how much concurrency we want in reads/writes to nfs for loads/dumps
MAJOR change. Split DataDistribution into 2 classes: DataDistributer and DataDistribution. I haven't actually changed or added functionality DataDistributer deals with the distribution of the data, while DataDistribution deals with things to do when data is already distributed, right now is only a few data checks Note that DataDistributer now has two db fields: srcDb and destDb. This is different to before, when destDb was rather a parameter passed as arguments to the methods Methods in DataDistributer have been tidied up a little (specially load and dump ones)
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.