This lesson covers:

CrossFilter.js

Crossfilter is a library created by the guys at Square. If you are still thinking about Backbone Collections, crossfilter is like a collection created to handle thousands of models at once. There is a lot of magic going on behind the scenes, and it is a little more strict about how you need to use it, but it allows for javascript to handle data that it would not usually be able to handle.

It’s a good idea to bookmark the API reference page on github so you can quickly refer to it.

If you are following along with our steps you have already used crossfilter in d3 graphs.

    var cf = crossfilter(d);

    var calorie = cf.dimension(function(d) { return d.calories; });
    var calories = calorie.group();
    var calData = calories.all();

We create a collection cf, save a dimension of that collection, and then create a group from that collection.

Crossfilter heavily uses key functions or accessor functions. That is what we are passing to .dimension(). It tells crossfilter how to organize all of our data. Crossfilter will take that value from each object in the data and sort it, keep track of the values, and setup functions to easily compare the different values. This can be slow and memory intensive at the beginning, during setup, but it save alot of time when we want to regroup or sort a dimension. We just have to make sure we reuse as much as we can.

TODO: find a better way to either add a lot of data, or find another data source with more data. There is just not much use for crossfilter with < 20 data points.

Map / Reduce

Map / Reduce is a blanket term for handling flat lists of data.
TODO: was the term map/reduce originated at Google? Verify.

Generally speaking, a map function pulls all relavent data elements, and the reduce function boils those elements down, combining them or minipulating them in whatever way needed. You can have multiple map functions, that continue to filter out elements, and you can have multiple reduce functions that continue to reduce the data down to pure information. It is designed to be flexible enough to be spread across multiple servers (outside of the scope of client side applications), and therefore, in theory, work well web workers. It also makes it easy to reuse map or reduce functions.