Intel® Parallel Computing Center at SURFsara BV


Principal Investigators:

ValeriuValeriu studied Electrical Engineering and got his MSc at the Polytechnic University of Bucharest. He followed-up with a PhD in Computer Architecture at the same institute. Afterwards, he continued as a postdoctoral researcher at both Eindhoven and Groningen University, working on GPU computing, computer vision, and embedded systems in the scope of several EU-funded projects. In 2014, he joined SURFsara as HPC consultant, focusing on machine learning. End 2016, he became the PI of the Intel Parallel Computing Center at SURFsara, focusing on optimizing deep learning techniques using Intel architecture, as well as extend their use to other application domains.


SURFsara is the national supercomputing center in the Netherlands, operating among other systems the Dutch national supercomputer. SURFsara offers its HPC services to researchers from the Dutch academic sector, and is aware of the rapid development and impact of machine learning in HPC. Since 2017 SURFsara became an Intel PCC, focusing on speeding up deep learning workloads on Intel-based supercomputers.

The original focus for 2017 was on minimizing the time-to-train of several deep convolutional neural networks on state-of-the-art computer vision datasets such as ImageNet and beyond. Some of the highlights of 2017 were less than 30 minute training time on the popular Imagenet-1K dataset, as well as state-of-the art results in terms of accuracy on other datasets such as the full ImageNet and Places-365 datasets. The results were obtained on large-scale state-of-the-art systems such as TACC’s Stampede2 and BSC’s MareNostrum4.

Our main research proved to have a two-way objective: (1) making sure that multi-node scaling is performed as efficient as possible and (2) developing new learning rate schedules that converge to state-of-the-art accuracies for very large batch training on up to 1536 Intel® Xeon Phi™ nodes.  Thirdly, we’ve evaluated several network architectures, particularly wider residual models on larger computer vision datasets, and obtained record accuracies. We are currently working on a methodology that aims to optimally trade off the time-to-train with regard to the desired degree of accuracy on these popular datasets. All experiments and disseminations from 2017 are resulted from the Intel Caffe* framework, used in combination with Intel MLSL (Machine Learning Scaling Library).

SURFsara will continue this work in 2018 , but will extend the focus on porting the large-batch SGD training techniques to the popular Tensorflow* framework, as well as to extend the application domain beyond computer vision, towards replacing or augmenting traditional HPC applications from natural sciences such as climatology, particle physics, and astronomy with novel deep learning techniques. Particular focus will also be on the rapidly developing medical imaging field, in need for both large-scale compute, memory bandwidth and capacity, due to the large-scale data dimensionality. Since Tensorflow allows for more flexibility in the types of architectures and usage scenarios, we will experiment with generative models, as well as with fine-tuning methods from pre-trained models when tackling these problems.

Furthermore, SURFsara is actively involved in several other deep learning activities. An important one is EDL (Efficient Deep Learning), a large Dutch-funded project focusing on bringing deep learning to industrial applications, involving many academic and industrial partners. Additionally, SURF’s innovation lab (SOIL) started an internal project that supports 3-4 projects from HPC-focused simulation sciences that propose to use deep learning to augment or extend their applications, both financially and with consultancy. These techniques already start presenting promising results, and we believe that making scalable tools and methodologies based on Caffe and Tensorflow available for the research sector is of high importance, and will further help the development of several HPC-related fields.


  • Initial evaluation of Intel Caffe, presentation at IXPUG2017.
  • Follow-up description of Intel Caffe scaling, presentation at the Intel Booth at ISC 2017.
  • Brief description of work on scaling residual networks. Also, details on other (larger) datasets such as Imagenet-22K and Places-365.
  • State-of-the-art large batch training, arXiv paper.
  • Under review: Large Minibatch Training on Supercomputers with Improved Accuracy and Reduced Time to Train.
  • In preparation: Efficient wide network training for state-of-the-art computer vision.

Related websites:


For more complete information about compiler optimizations, see our Optimization Notice.