Aaron Cordova's Blog

Big Data Reading List

Since there's so much going on in the Big Data space these days, getting up to speed quickly is important for lots of technical decision makers.

This is a list of books and articles that might be helpful for learning about the major concepts and innovations in the Big Data space. It is by no means an attempt to be comprehensive or even an unbiased representation, just useful. I've organized the list according to what I feel are a few fundamental approaches to tackling the Big Data challenge, namely:

New Distributed Architectures

Distributed Architectures address the most basic problems related to Big Data - i.e. what does one do when the data no longer 'fits' on a single machine. Ostensibly one must store or stream the data and process it somehow.

Machine Learning

Machine learning, modeling, data mining, etc address the problem of understanding the data. Even if I can store and process the data, ultimately I need to gain some level of human understanding of the information contained therein. 

Machine learning can help solve this problem via modeling - either in a way such that the model is transparent and a human can understand the fundamental processes that generated the data, or a model that can be used in place of human understanding to help make decision. It can also reduce dimensionality and reveal structure.

Visualization

Visualization is a different approach to helping understand the data that leverages the considerable power of the human visual cortex to help find patterns and structure in the data. 


Sometimes these approaches combine. I think that perhaps all of the above approaches are coalescing into a new field that could be termed 'Data Science'.


New Distributed Architecture Concepts


Machine Learning and New Architectures

Machine Learning

Visualization

A Few Blogs / Sites