To bring you up to speed, RNN is a recurrent neural network, and this is a class of artificial neural network where connections between units form a directed graph along a sequence.
Intel did a demo in Oregon labs using AWS Sockeye Neural Machine Translation (NMT) model with Apache* MXNet* with Intel Math Kernel Library (Intel MKL).
Believe it or not, the library makes the whole difference and previously deep learning training and inferencing on CPU took an unnecessarily long time because the software was not written to take full advantage of the hardware features and functionality.
Intel Xeon beats Nvidia V100 by 4X
For machine translation which uses RNNs, the Intel Xeon Scalable processor outperforms NVidia V100 by 4x on the AWS Sockeye Neural Machine Translation (NMT) model with Apache MXNet when Intel Math Kernel Library (Intel MKL) is used.
Previously, deep learning training and inference on CPUs took an unnecessarily long time because the software was not written to take full advantage of the hardware features and functionality. That is no longer the case. The Intel Xeon Scalable processor with optimized software has demonstrated enormous performance gains for deep learning compared to the previous generations without optimized software.
Intel Xeon v3 processor better known to Fudzilla readers as Haswell, gains up to 198x for inference and 127x for training measured with GoogleNet v1 for inference and AlexNet for training using Intel Optimized Caffe.
Video of the live demo
This gains apply to various types of models including multi-layer perceptron (MLP), convolutional neural networks (CNNs), and the various types of recurrent neural networks (RNNs). The performance gap between GPUs and CPUs for deep learning training and inference has narrowed, and for some workloads, CPUs now even have an advantage over GPUs.
Intel has much more detail in its blogs and it even includes a demo on the video where Vivian TJanecek from Intel's data center marketing and Sowmya Bobba, Intel's ML engineer demonstrate this demo on video. You can clearly see that th Intel bases system scores 93 sentences per second while Nvidia V100 machine scores 22 sentences per second.
The experiments were conducted using the servers at Amazon Web Services (AWS) with the publicly available Apache MXNet framework. It is important to mention that a neutral framework was not maintained by neither Intel nor Nvidia. The benchmark used is the AWS Sockeye, an open source project for NMT.
We advise you to check the blog and look at the video - it definitely sounds interesting to any data scientist.
.