Article

Intel® Math Kernel Library for Deep Neural Networks: Part 2 – Code Build and Walkthrough

Learn how to configure the Eclipse* IDE to build the C++ code sample, along with a code walkthrough based on the AlexNet deep learning topology for AI applications.
作者: Bryan B. (Intel) 最后更新时间: 2018/05/23 - 11:00
博客

Big Datasets from Small Experiments

作者: Andrey Vladimirov 最后更新时间: 2019/07/04 - 18:46
Article

Enabling Intel® MKL in PETSc applications

 
作者: Gennady F. (Blackbelt) 最后更新时间: 2018/05/24 - 15:48
Article

Free access to Intel® Compilers, Performance libraries, Analysis tools and more...

Intel® Parallel Studio XE is a very popular product from Intel that includes the Intel® Compilers, Intel® Performance Libraries, tools for analysis, debugging and tuning, tools for MPI and the Intel® MPI Library. Did you know that some of these are available for free? Here is a guide to “what is available free” from the Intel Parallel Studio XE suites.
作者: 管理 最后更新时间: 2019/09/30 - 17:28
Article

Caffe* Optimized for Intel® Architecture: Applying Modern Code Techniques

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
作者: 最后更新时间: 2019/10/15 - 15:30
Article

Thread Parallelism in Cython*

Cython* is a superset of Python* that additionally supports C functions and C types on variable and class attributes. Cython generates C extension modules, which can be used by the main Python program using the import statement.
作者: Nguyen, Loc Q (Intel) 最后更新时间: 2019/10/15 - 16:40
Article

Caffe* Training on Multi-node Distributed-memory Systems Based on Intel® Xeon® Processor E5 Family

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and one of the most popular community frameworks for image recognition. Caffe is often used as a benchmark together with AlexNet*, a neural network topology for image recognition, and ImageNet*, a database of labeled images.
作者: Gennady F. (Blackbelt) 最后更新时间: 2019/10/15 - 16:50
博客

The JITter Conundrum - Just in Time for Your Traffic Jam

In interpreted languages, it just takes longer to get stuff done - I earlier gave the example where the Python source code a = b + c would result in a BINARY_ADD byte code which takes 78 machine instructions to do the add, but it's a single native ADD instruction if run in compiled language like C or C++. How can we speed this up? Or as the performance expert would say, how do I decrease...
作者: David S. (Blackbelt) 最后更新时间: 2019/10/15 - 19:42