Data Mining and Search

Productive programming for data mining and search algorithms (Computer Science)

Data mining and search are techniques with broad applications across computing and significant parallelism that can be exploited.

The key challenges are:

  1. How can we maximize the performance of these algorithms for a given machine?
  2. How can enable the programmer to achieve this performance with a minimum of effort?

To answer these questions, we are developing two complementary techniques to optimize the implementation of data mining and search algorithms. First, we are developing novel abstractions for data parallel programming that allow algorithms to be specified at a very high-level (to ease programming) and be mapped to a broad range of platforms, including multicore processors, clusters of multicore systems, or accelerators such as NVidia GPUs (for portability). Second, we are developing strategies to use autotuning (using analytical models, AI techniques and hybrid models) to search the space of possible implementations of a given algorithm to find an optimal one for a given machine. These are both techniques that we've successfully applied in the past to numeric codes, that we are now applying to non-numeric codes.

Summer students involved in this project would both act as users of our very-high level programming models to explore their expressibility and evaluate their ease of use, as well as contributing to the design and implementation of the autotuner.