The CoLe framework for distributed heterogeneous data mining

Our CoLe framework (Cooperative Learning) was developed especially to perform distributed data mining. Inspired by improving on the competition approach (as TEAMWORK and TECHS are), we realized that there are some additional specific problems around data mining that require specialized solutions. These problems are

the problem instance is rather big, since it essentially contains the content of one or several large data bases
the results we want to mine can be rather complex structures for which there are not even sequential methods to mine them.

And, naturally, all of the problems of distributed, knowledge-based search are still there.

The size of a problem instance makes using the core idea of the competition approach, giving the full instance to every agent, a little bit problematical. Therefore, in CoLe we do not give the full instance to every agent, but a selection that most of the time is intended to nevertheless represent the full instance (meaning that a solution to the selection should also be a solution to the full instance, although we cannot always guarantee this for the full duration of a search). This means that CoLe combines the improvement of the competition approach paradigm with another paradigm from literature that is called dividing the problem (instance) into subproblems

The following picture shows the structure and workflow of a CoLe-based data mining system: Structure and workflow of CoLe

Indicated by a green color are our agents, the miner agents m₁ to m_n and the combination agent Ag_CBN. The combination agent controls the system and decides what selection from the particular database a miner is working on in the current round. These databases are indicated in yellow and each miner works on one of the databases (although several miners can work on the same database). When a mining round is finished, each miner selects its best mined knowledge (indicated in blue as a K_i) and sends it to Ag_CBN. Ag_CBN combines this knowledge, evaluates it and selects what of the knowledge finds its way into the final collection of knowledge K. Ag_CBN also creates out of the knowledge feedback for each of the miners.

By allowing the combination of knowledge from different miners, we can mine complex knowledge structures, although at the cost of having Ag_CBN doing evaluations of this knowledge.

to our general page on distributed search.

Last Change: 5/12/13