Getting started with the Intel® DevCloud

Hello all! Happily, this article refers to the Intel® DevCloud! If you don't have access, sign up for it now.

If you don't know anything about it, it's a server cluster consisting of Intel® Xeon® Scalable processors, primed for all your machine learning and deep learning needs. It's also pre-loaded with these frameworks and libraries:

  • Intel® Optimization for PyTorch*
  • Intel® Optimization for PaddlePaddle*
  • Intel® Optimization for TensorFlow*
  • Intel® Optimization for Caffe*
  • Intel® Distribution for Python* (including NumPy, SciPy, and scikit-learn*)
  • Keras* library

Great, now that's out of the way, let's get to it.

If you have requested access, you should be getting email instructions on how to connect from the Intel DevCloud Team at Colfax. You might also be asking who is Colfax. You thought this was Intel. Colfax Research is a renowned thought leader in regards to the scalable nature of the Intel® Xeon® Processors. They also have countless hours of training on this content.

Back to connecting. If you signed up above and read this far, you should have a email from Colfax explaining how to connect (this usually takes 10 minutes, so you could still need to wait if you're a fast reader). This is done either via terminal or through a GUI client, like Putty*, if your on Windows*. For specific instructions on that, follow the links given in your welcome email, "Welcome to the Intel® DevCloud"

Now, once you've connected, it's just any Linux* Terminal environment. You're not going to have the necessary privileges to install new software via apt-get or yum, but you can do everything else, including run python code and train models, which is what you're here to do anyway.

Note: Do not train your models by running python code via terminal like it's a Linux box in your basement.

This a scaled resource in batch mode meant to be shared and to run jobs in the order it gets them. I would say it's democratic, but it's more socialist.

A quick overview on how this works: You login and get access to a terminal. This is the login node. From here, you can manage your file system, make sure everything seems hunky-dory, and then queue up jobs, and that is queue with a "q". How do you get jobs into this queue? You submit them with the "qsub" command. Get it? You can find all the information you need on submitting jobs in the "Compute" section on Colfax's site. 

To walk you through this process, let's run an example job. We're going to do an easy one. You are going to use the "echo" command and a pipe to get a GitHub* repository. All from the comfort of the q.

echo git clone https://github.com/mspandit/neon-tf-mnist.git | qsub

Great, now that's queued. You should also get a job name, something like "520.c001." Make note of the job number, which is the first section of the name. In this case, "502." To check on queued jobs, run:

qstat

You can see your queued and running jobs. Once they complete, you get two files, one for normal output and one for errors. These are going to be named "STDIN.o520" and "STDIN.e520" or something similar based on the job name. In our case, we had no errors, so the STDIN.e file was blank. The output file should read as below:

########################################################################
# Colfax Cluster - https://colfaxresearch.com/
#      Date:           Thu Nov  9 13:37:00 PST 2018
#    Job ID:           25006.c009
#      User:           u6770
# Resources:           neednodes=1,nodes=1,walltime=06:00:00
########################################################################

Cloning into 'neon-tf-mnist'...

########################################################################
# Colfax Cluster
# End of output for job 25006.c009
# Date: Thu Nov  9 13:37:02 PST 2018
########################################################################

While this is an example that could have totally been done in the Login node and a standard git clone command, we're just using it to get you familiar with queuing jobs. It's a great way to unarchive large data sets, for example, as well as running more complex jobs, like training a Deep Learning Model.

What about bigger jobs, with more of a process to them? For these jobs you're going to want to run it via a command file. This is effectively a shell script for batch jobs that runs via the qsub process. For an example job, we're going to run the two training files above and compare the results. To do this easily, run:

nano launch

and copy and paste the following to create your Command File

#PBS -N mnist_tf_neon
echo Starting Calculation
#Activate the TF Virtual Environment
source activate root
echo Starting TF training
#Run the TF MNIST training, making sure to use Intel MKL
python ~/neon-tf-mnist/tf_mnist_mlp.py -b mkl
source deactivate
echo Starting Neon Training
#Activate the Neon Virtual Environment
source /opt/neon/intel_neon/bin/activate
​#Run the Neon Training
python ~/neon-tf-mnist/neon_mnist_mlp.py
source deactivate
echo End of Calculation

To then run this command file, run:

qsub launch

You can also transfer files to and from the cluster. Open another terminal on your *nix system and copy the trained neon model back to your machine:

scp colfax:/path/to/remote/file /path/to/local/directory/

Again, if you're on Windows, use whatever GUI you find the most comfortable.

In this case, feel free to transfer your trained models back to your local machine for further experimentation.

That's it. For more questions, it's best to go straight to the help on the Colfax page.

All of this information is also in the README.txt file in your home folder.

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.
Возможность комментирования русскоязычного контента была отключена. Узнать подробнее.