Nvidia's new TensorRT speeds machine learning predictions

Bare Metal Servers and Cloud Server Hosting

InetServices offers both Windows and Linux bare metal server hosting, and cloud server hosting for any small to medium size business. We also offer both PCI and HIPAA Compliant servers allowing you to achieve PCI or HIPAA Compliance without all the worries of figuring it out. InetServices offers much more than just dedicated servers and cloud servers, we offer you a complete solution to your hosting needs including Big Data, Disaster Recovery, and High Availability services.

Nvidia's new TensorRT speeds machine learning predictions

Nvidia has released a new version of TensorRT, a runtime system for serving inferences using deep learning models through Nvidia’s own GPUs.

Inferences, or predictions made from a trained model, can be served from either CPUs or GPUs. Serving inferences from GPUs is part of Nvidia’s strategy to get greater adoption of its processors, countering what AMD is doing to break Nvidia’s stranglehold on the machine learning GPU market.

Nvidia claims the GPU-based TensorRT is better across the board for inferencing than CPU-only approaches. One of Nvidia’s proffered benchmarks, the AlexNet image classification test under the Caffe framework, claims TensorRT to be 42 times faster than a CPU-only version of the same test — 16,041 images per second vs. 374—when run on Nvidia’s Tesla P40 processor. (Always take industry benchmarks with a grain of salt.)

Serving predictions from a GPU is also more power-efficient and delivers results with lower latency, Nvidia claims.

TensorRT doesn’t work with anything other than Nvidia’s own GPU lineup, and is a proprietary, closed-source offering. AMD, by contrast, has been promising a more open-ended approach to how its GPUs can be used for machine learning applications, by way of the ROCm open source hardware-independent library for accelerating machine learning.

Source: InfoWorld Big Data