Move over Memcached and Redis, here comes Netflix's Hollow

After two years of internal use, Netflix is offering a new open source project as a powerful option to cache data sets that change constantly.

Hollow is a Java library and toolset aimed at in-memory caching of data sets up to several gigabytes in size. Netflix says Hollow’s purpose is threefold: It’s intended to be more efficient at storing data; it can provide tools to automatically generate APIs for convenient access to the data; and it can automatically analyze data use patterns to more efficiently synchronize with the back end.

Let’s keep this between us

Most of the scenarios for caching data on a system where it isn’t stored—a “consumer” system rather than a “producer” system—involve using a product like Memcached or Redis. Hollow is reminiscent of both products since it uses in-memory storage for fast access, but it isn’t an actual data store like Redis.

Unlike many other data caching systems, Hollow is intended to be coupled to a specific data set—a given schema with certain fields, typically a JSON stream. This requires some prep work, although Hollow provides some tools to partly automate the process. The reason for doing so: Hollow can store the data in-memory as fixed-length, strongly typed chunks that aren’t subject to Java’s garbage collection. As a result, they’re faster to access than conventional Java objects.

Another purported boon with Hollow is that it provides a gamut of tooling for working with the data. Once you’ve defined a schema for the data, Hollow can automatically produce a Java API that can supply autocomplete data to an IDE. The data can also be tracked as it changes, so developers have access to point-in-time snapshots, differences between snapshots, and data rollbacks.

Faster all around

A lot of the advantages Netflix claims for Hollow involve basic operational efficiency—namely, faster startup time for servers and less memory churn. But Hollow’s data modeling and management tools are also meant to help with development, not simply speed production.

“Imagine being able to quickly shunt your entire production data set—current or from any point in the recent past—down to a local development workstation, load it, then exactly reproduce specific production scenarios,” Netflix says in its introductory blog post.

One caveat is that Hollow isn’t suited for data sets of all sizes—“KB, MB, and GB, but not TB,” is how the company puts it in its documentation. That said, Netflix also implies that Hollow reduces the amount of sprawl required by a cached data set. “With the right framework, and a little bit of data modeling, that [memory] threshold is likely much higher than you think,” Netflix writes.

Source: InfoWorld Big Data