To enhance the online gaming user experience, Tencent uses an in-game purchase recommendation system employing the machine learning method to help users decide what equipment they would want to buy within their games. Tencent machine learning engine uses DGEMM6 extensively in its module to compute the coefficients for the logistic regression machine learning algorithm.
The Kyoto University team recognized that the performance of the open source Theano C++ multi-core code could be significantly improved. They worked with Intel to improve Theano multicore performance using a dual-socket Intel® Xeon®processor based system as the next generation Intel® Xeon Phi™ processors were not available at that time
Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and one of the most popular community frameworks for image recognition. Caffe is often used as a benchmark together with AlexNet*, a neural network topology for image recognition, and ImageNet*, a database of labeled images.
Baidu’s recently announced deep learning benchmark, DeepBench, documents performance for the lowest-level compute and communication primitives for deep learning (DL) applications. The goal is to provide a standard benchmark to evaluate different hardware platforms using the vendor’s DL libraries.
Today, scientific and business industries collect large amounts of data, analyze them, and make decisions based on the outcome of the analysis. This paper compares the performance of Basic Linear Algebra Subprograms (BLAS), libraries OpenBLAS, and the Intel® Math Kernel Library (Intel® MKL).
As Deep Neural Network (DNN) applications grow in importance in various areas including internet search engines and medical imaging, Intel teams are working on software solutions to accelerate these workloads that will become available in future versions of Intel® Math Kernel Library (Intel® MKL) and Intel® Data Analytics Acceleration Library (Intel® DAAL). This technical preview demonstrates...
In this paper, we walk through a 3D Animation algorithm example and describe some techniques and methodologies that may benefit your next vectorization endeavors. We also integrate the algorithm with SIMD Data Layout Templates (SDLT), which is a feature of Intel® C++ Compiler, to improve data layout and SIMD efficiency. Includes code sample.
In this tutorial, we demonstrate some possible ways to optimize an application to run on the Intel® Xeon Phi™ processor