In this article an OpenMP* based implementation of the Ant Colony Optimization algorithm was analyzed for bottlenecks with Intel® VTune™ Amplifier XE 2016 together with improvements using hybrid MPI-OpenMP and Intel® Threading Building Blocks were introduced to achieve efficient scaling across a four-socket Intel® Xeon® processor E7-8890 v4 processor-based system.
This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
In the previous article, we discussed the performance and accuracy of Binarized Neural Networks (BNN). We also introduced a BNN coded from scratch in the Wolfram Language. The key component of this neural network is Matrix Multiplication.
Review of Architecture and Optimization on Intel® Xeon® Scalable Processors in context of Intel® Optimization for TensorFlow* on Intel® AI DevCloudPresent the architecture and optimization on Intel® Xeon® Scalable Processors (CPU) using Intel® Optimization for TensorFlow* on the Intel® AI DevCloud
The computer learning code Caffe* has been optimized for Intel® Xeon Phi™ processors. This article provides detailed instructions on how to compile and run this Caffe* optimized for Intel® architecture to obtain the best performance on Intel Xeon Phi processors.
See how binarized neural networks can show significantly faster operations on Intel® Xeon® Scalable processors.
In continued efforts to optimize Deep Learning workloads on Intel® architecture, our engineers explore various paths leading to the maximum performance.