How MIT's C/C++ extension breaks parallel processing bottlenecks

Earlier this week, MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) department announced word of Milk, a system that speeds up parallel processing of big data sets by as much as three or four times.

If you think this involves learning a whole new programming language, breathe easy. Milk is less a radical departure from existing software development than a refinement of an existing set of C/C++ tools.

All together now

According to the paper authored by the CSAIL team, Milk is a  C/C++ language family extension that addresses the memory bottlenecks plaguing big data applications. Apps that run in parallel contend with each other for memory access, so any gains from parallel processing are offset by the time spent waiting for memory.

Milk solves these problems by extending an existing library, OpenMP, widely used in C/C++ programming for parallelizing access to shared memory. Programmers typically use OpenMP by annotating sections of their code with directives (“pragmas”) to the compiler to use OpenMP extensions, and Milk works the same way. The directives are syntactically similar, and in some cases, they’re minor variants of the existing OpenMP pragmas, so existing OpenMP apps don’t have to be heavily reworked to be sped up.

Milk’s big advantage is that it performs what the paper’s authors describe as “DRAM-conscious clustering.” Since data shuttled from memory is cached locally on the CPU, batching together data requests from multiple processes allows the on-CPU cache to be shared more evenly between them.

The most advanced use of Milk requires using some functions exposed by the library — in other words, some rewriting — but it’s clearly possible to get some results right away by simply decorating existing code.

Let’s not throw all this out yet

As CPU speeds top out, attention has turned to other methods to ramp up processing power. The most direct option is to scale out: spreading workloads across multiple cores on a single chip, across multiple CPUs, or throughout a cluster of machines. While a plethora of tools exist to spread out workloads in these ways, the languages used for them don’t take parallelism into account as part of their designs. Hence the creation of functional languages like Pony to provide a fresh set of metaphors for how to program in such environments.

Another approach has been to work around the memory-to-CPU bottleneck by moving more of the processing to where the data already resides. Example: the MapD database, which uses GPUs and their local memory for both accelerated processing and distributed data caching.

Each of these approaches has their downsides. With new languages, there’s the pain of scrapping existing workflows and toolchains, some of which have decades of work behind them. Using GPUs has some of the same problems: Shifting workloads to a GPU is easy only if the existing work’s abstracted away through a toolkit that can be made GPU-aware. Otherwise, you’re back to rewriting everything from scratch. 

A project like Milk, on the other hand, is adds a substantial improvement to a tool set that’s already widely used and well-understood. It’s always easier to transform existing work than tear it down and start over, so Milk provides a way to squeeze more out of what we already have.

Source: InfoWorld Big Data