Deep Learning Inference on PowerEdge R7425

 

Published March 2020

deep_learning_inference_on_poweredge_r7425
This paper talks about Deep Learning Inference using NVIDIA T4-16GB GPU and TensorRT™. The NVIDIA T4-16GB GPU is based on their latest Turing architecture which significantly boosts graphic performance using a new GPU processor (streaming multiprocessor) with improved shader execution efficiency and new memory system architecture that supports GDDR6 memory technology. Turing’s Tensor cores provide higher throughput and lower latency for AI Inference applications.

Dell EMC PowerEdge R7425 is based on AMD’s EPYC™ architecture and since EPYC™ architecture supports higher number of PCIe Gen3 x16 lanes, it allows the server to be used as a scale-up inference server. It becomes a perfect solution when running large production-based AI workloads where both throughput and latency are important.

In this paper we tested the inference optimization tool Nvidia TensorRT™ 5 on the Dell EMC PowerEdge R7425 server to accelerate CNN image classification applications and demonstrate its capability to provide higher throughput & lower latency for neural models like ResNet50. During the tests, we ran inferences of image classification models in different precision modes on the server R7425 using NVDIA T4-16GB GPU, with both implementation of TensorRT™ i.e. the native TensorRT™ C++ API and the integrated TensorFlow-TensorRT™ integration library. TensorFlow was used as the primary framework for the pre-trained models to compare the optimized performance in terms of throughput (images/sec) and latency (milliseconds).