Data-Intensive Computing

Data-intensive computing has emerged as an area of intense interest in high-performance computing as the rate at which data is now being produced and stored is outstripping our ability to analyze it. By definition, these problems cannot be treated with the same means used to tackle traditional problems in modeling and simulation, and a set of tools is emerging to fill this gap.

These tools, while not necessarily new, are new to the field of high-performance computing, and conversely, many of the concepts common to high-performance computing are new to the statisticians and computer scientists who have traditionally used these data-oriented tools.

Practical Data Analysis
High-Performance Storage
Parallel Computing with R
Topics in Hadoop