A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers
In the previous article, we discussed deep learning frameworks and selected TensorFlow* because it has Keras, has a flourishing developer community, provides strategic support from Google, has a version optimized for Intel® processors, and is simple to deploy.
In this article, we continue our discussion of the infrastructure aspects of the project and focus on the computing resources from Intel that can be used to train and execute deep learning models. Depending on your goals and resources, such as budget, time, and talent, different options might be appropriate. For example, if you have a large data set or a tight timeline, a multiple CPU cluster might be the right choice. However, if you are an independent developer experimenting with various deep learning frameworks or techniques, a single workstation with a multicore CPU might be sufficient. We also provide a comparative overview of existing computing resources for deep learning from Intel.
There are three primary circuit design paradigms or architectures:
We’ll discuss the first two items in detail. GPUs are outside the scope of this series of articles and won’t be covered.
The following are the characteristic features of CPU and FPGA circuit designs:
The computational effectiveness and efficiency of different circuit designs vary from task to task. Initially CPUs, being the oldest, were not designed for huge amounts of vector or matrix multiplications, which deep learning requires. However, recent innovations from Intel, such as Intel® Many Integrated Core Architecture (Intel® MIC Architecture) and the vector processing unit module move CPUs to the forefront of parallel technologies.
At the same time, the most recent benchmarks from Intel3 and Google4 demonstrate that custom FPGAs achieve state-of-the-art performance for some deep learning tasks by taking into account the specifics of the computational task (matrix sparsity, low precision, and so on).
FPGAs allow you to achieve power efficiency and speed when a computational pattern is fixed, which is ideal for the inference stage in a deep learning project. Many Integrated Core CPUs based on Intel MIC Architecture help achieve efficiency and speed for an arbitrary computational pattern matching vector or matrix multiplication due to massive parallelism, which is perfect for neural network training.
Figure 1. Two different steps of data analysis workflow. The hardware option you choose depends on the workflow step.
Following is a review of computing infrastructures and the one chosen for the sample project.
From a practical point of view, when selecting a computing infrastructure, the options available will come from the following two models
Each element of this model has a set of secondary options:
The computing infrastructure selection criteria is based on a decision tree, and generates a set of questions at each split point. Starting from the root, the questions are:
The final choice is a path from the root to a leaf for the selection criteria. To find this path, use the following evaluation criteria and questions (partially coming from the previous article about the deep learning framework selection):
Additionally, for a research institution or academic team, the following question might be relevant: How do we justify the cost and budget for cloud computing resources in a grant proposal?
In-house hardware is a long-term investment and has the following parameters:
To explain what options is reasonable and when, we built a simple financial model spreadsheet.
Figure 2. On-premises versus cloud CPU.
According to the graph in Figure 2, from a financial point of view (not accounting for data set size, scalability, and so on) cloud is a viable option if you plan to use it 24x7 for less than four or five months. Otherwise, it makes sense to invest in your own compute infrastructure.
If you plan to use pretrained models, as we did in our sample project, your hardware requirements are modest. In our sample project, we decided to stick with one dedicated workstation with an Intel Xeon Phi processor. However, you can even try to retrain and fine-tune a model on your laptop.
If the size of your data set is greater than 100 GB, you will probably need to use a cloud with multiple CPUs. Typically, data scientists rent a powerful machine in a cloud, train a deep learning model, export the model, and then stop the machine. This approach is cost-effective.
The easiest way to add AI to your project is to upload data to the cloud, train the provided default model on your data in a few clicks, and deploy it as an API, also in a few clicks. Choosing the cloud option is good for projects in the early stages or for people with minimal coding skills. Financial aspects aside, cloud is a better option as it enables fast experimentation, scalability, straightforward deployment, and minimizes administration efforts.
More likely, you will want to tweak your model or experiment with neural network architecture, hyperparameters, and so on. You’ll also want to run multiple experiments in parallel. Cloud is a good option for this too since you can start multiple machines with the same configuration and hence experiment faster. If you don’t own your own cluster, which is costly, you will have to wait for one experiment to finish before you can start a new one on a single machine.
However, fast experimentation comes at a cost—you have to admin a new environment for each new machine and parallel experiment. Luckily, you can abstract your experimentation environment for deep learning and optimize the administration efforts with Docker*, a container technology, by packaging all the dependencies in a layered portable executable file. We recommend you always work using Docker since you can easily switch to a new compute infrastructure or share your work with others.
If you plan to deploy your model and expose it as an API, you might prefer to use a cloud and some cloud providers in particular. For example, Google announced TPUs optimized for TensorFlow and native support of deep learning models trained with TensorFlow (TPUs aren’t available yet in Google Cloud). Microsoft supports fast deployment of models trained with the Microsoft Cognitive Toolkit*. Cloud deployments are especially appropriate for teams without data engineers or DevOps since data scientists can execute the project from start to finish on their own. For middle-size or large-size deployments, standard cloud options might not be a suitable solution. You would need DevOps engineers to build distributed high-load system optimized for access by many users.
To decide what to use for our sample project, we focused on the following elements:
Based on these requirements and the evaluation described above, cloud is the right choice for us. We need to compare prices to select the best cloud provider (which we didn’t do in this project out of necessity.) Because our timeline is not tight—we can work on a single machine, run experiments sequentially, and wait while a model is training—we requested a single powerful workstation with the Intel® Xeon Phi™ processor. For deployment, we plan to use TensorFlow Serving5 in Google Cloud as it provides a fully managed service.
When selecting the right computing infrastructure, first decide whether you want to work in a cloud or use your own hardware. In most cases, cloud is a better option. Among all existing clouds, we think that Google Cloud is the best one if you are working with TensorFlow. It makes sense to invest in your own computing infrastructure if you plan to work on deep learning projects for more than six months and can accurately plan the required computational needs.
|Prev: Select a Deep Learning Framework||Next: Augment AI with Human Intelligence Using Amazon Mechanical Turk*|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804