LLVM-powered Pocl puts parallel processing on multiple hardware platforms

LLVM, the open source compiler framework that powers everything from Mozilla’s Rust language to Apple’s Swift, is emerging in yet another powerful role: an enabler of code deployment systems that target multiple classes of hardware for speeding up jobs like machine learning.

To write code that can run on CPUs, GPUs, ASICs, and FPGAs alike—something hugely useful with machine learning apps—it’s best to use something like OpenCL, which allows a program to be written once and then automatically deployed across all those different types of hardware.

Pocl, an implementation of OpenCL that was recently revamped to version 0.14, is using the LLVM compiler framework to do that kind of targeting. With Pocl, OpenCL code can be automatically deployed to any hardware platform with LLVM back-end support.

Pocl uses LLVM’s own Clang front end to take in C code that uses the OpenCL standard. Version 0.14 works with both LLVM 3.9 and the recently released LLVM 4.0. It also offers a new binary format for OpenCL executables, so they can be run on hosts that don’t have a compiler available.

Aside from being able to target multiple processor architectures and hardware types automatically, another reason Pocl uses LLVM  is that it aims to “[improve] performance portability of OpenCL programs with the kernel compiler and the task runtime, reducing the need for target-dependent manual optimizations,” according to the release note for version 0.14. 

There are other projects that automatically generate OpenCL code tailored to multiple hardware targets. The Lift project, written in Java, is one such code generation system. Lift generates a specially tailored IL (intermediate language) that allows OpenCL abstractions to be readily mapped to the behavior of the target hardware. In fact, LLVM works like this; it generates an IL from source code, which is then compiled for a given hardware platform. Another such project, Futhark, generates GPU-specific code.

LLVM is also being used as a code-generating system for other aspects of machine learning. The Weld project generates LLVM-deployed code that is designed to speed up the various phases of a data analysis framework. Code spends less time shuttling data back and forth between components in the framework and more time doing actual data processing.

The development of new kinds of hardware targets is likely to continue driving the need for code generation systems that can target multiple hardware types. Google’s Tensor Processing Unit, for instance, is a custom ASIC devoted to speeding one particular phase of a machine learning job. If hardware types continue to proliferate and become more specialized, having code for them generated automatically will save time and labor.

Source: InfoWorld Big Data