Baidu Deep Neural Network Click-Through Rate on Intel® Xeon® Processors E5 v4

By Khang T Nguyen, Published: 06/08/2016, Last Updated: 06/08/2016


Every day people are searching the Web looking for information due to the business they are in, watching videos, or buying something. According to Hubspot,8 “…75% of users never scroll past the first page of search results.” If the ads of those products they are looking for happen to appear at the top of the search, those ads are very likely to get visited. In order for an ad to appear in the first couple of search pages, that ad must be very popular, which normally takes years to happen. An ad can also appear at the top because it talks about information or products that is widely known or it contains popular keywords.

How do new web sites selling products or services appear at the top of the search list? The key is to use the right keywords that people might use to search for their products or services.

Baidu1 is the most popular search engine in China. Ad companies can pay Baidu so that their ads appear at the top of the search list. The company whose ads are visited/clicked by users will pay Baidu. The paid model can be pay per click (PPC) or paying a fixed amount of money for a certain period of time regardless of how many times a user clicks an ad.

To choose the right keywords for the search, Baidu employs machine learning2 in the core service of the ad platform of Baidu and their other applications. The heart of that engine is the click-through rate (CTR)3 module.

The next sections discuss Baidu CTR and how the Intel® Xeon® processor E5 v4 family helps improve the performance of the Baidu CTR module.

What is a Click-Through Rate?

CTR, according to Wikipedia, is the ratio of users who click a specific link to the number of total users who view a page, email, or advertisement.

Click-through rate

CTR helps predict which ads are relevant enough to get users to click the ads as many times as possible, since the ad companies get charged each time their ads are clicked.

In order to do that, Baidu employs Deep Neural Networks (DNNs).3 DNNs are used not only to predict the relevance of an ad but also to improve user experience when a user visits the Web. The user experience is improved by increasing the accuracy of speech and image recognition.

Deep Neural Networks

A definition of DNN,4 from Wikipedia: A deep neural network (DNN) is an artificial neural network (ANN) with multiple hidden layers of units between the input and output layers.

A neural network (NN) is a system that approximates the operation of the human brain by modeling the neuronal structure of the cerebral cortex on a much smaller scale. An NN consists of three layers: the input, hidden, and output. To train an NN, a pattern (training data set) is fed to the input layer and based on the NN’s rules, it will modify the weights of the connections according to the input patterns. DNN is an NN with many more hidden layers. Each time the information goes from one hidden layer to the next, it combines into something more complex. This way, DNNs can much more accurately model the input pattern compared to that of NNs.


Figure 1: Diagram of deep neural networks.

Figure 1 shows a diagram of a typical DNN with four layers in the hidden layer. In general, more hidden layers and larger training data sets result in more accurately predicting the outcomes. Baidu is using very large data sets for their DNN. For example, it uses more than 10 billion training samples for voice recognition and about 100 million images to train its image recognition engine.

Baidu CTR and Intel® Xeon® Processor E5 v4 Family – A Good Match

Baidu uses large data sets to train its DNN module. Sometimes it takes weeks to complete the training process. Heavy computing power is needed to reduce the training time. Baidu’s DNN makes heavy use of DGEMM.5 DGEMM is the matrix multiplication function for floating-point numbers. It is the core operation for computing the two sample’s similarity in Baidu’s DNN full-connected layer for computing the CTR.

Baidu’s DNN uses the DGEMM function in the Intel® Math Kernel Library (Intel® MKL)6 to replace its DGEMM code. The Intel Xeon processor E5 v4 family supports Intel® Advanced Vector Extensions 2 (Intel® AVX2),7 and Intel MKL is highly optimized for performance using Intel AVX2. This results in performance improvement of Baidu’s DNN. The other benefit of using Intel MKL is that it is not necessary to change the code to take advantage of new features in future Intel® Xeon® processors since Intel MKL will auto-detect new features and makes use of them, if applicable. Just make sure to link the application to the latest version of Intel MKL.

Performance Test Procedure

To prove that the Baidu DNN gains performance due to Intel AVX2 and better hardware, we performed tests on two platforms. One system was equipped with the Intel® Xeon® processor E5-2699 v3 and the other with the Intel® Xeon® processor E5-2699 v4.

Test Configuration

System equipped with the Intel Xeon processor E5-2699 v4

  • System: Preproduction
  • Processors: Intel Xeon processor E5-2699 v4 @2.2GHz
  • Cache: 55 MB
  • Memory: 128 GB DDR4-2133MT/s

System equipped with Intel Xeon processor E5-2699 v3

  • System: Preproduction
  • Processors: Intel Xeon processor E5-2699 v3 @2.3GHz
  • Cache: 45 MB
  • Memory: 128 GB DDR4-2133 MT/s

Operating System: CentOS*


  • GNU* C Compiler Collection 4.8
  • Intel MKL 2016 Update 1

Application: Baidu Machine Learning Deep Neural Network CTR

Test Results

The Intel Xeon processor E5-2699 v4 computing capability helps Baidu reduce the training time of its DDN CTR module and helps provide a better user experience.

Figure 2

Figure 2: Comparison between the application using Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Advanced Vector Extensions 2 (Intel® AVX2).

Figure 2 shows the results between the application using Intel® Advanced Vector Extensions (Intel® AVX) and Intel AVX2 on system equipped with the Intel Xeon processor E5-2699 v4. Intel AVX2 allows the Baidu CTR search engine to increase the performance by 67 percent.

Figure 3

Figure 3: Comparison between the application using Intel® Xeon® processor E5-2699 v3 and the Intel® Xeon® processor E5-2699 v4.

Figure 3 shows the results on a system equipped with the Intel Xeon processor E5-2699 v3 and on a system equipped with the Intel Xeon processor E5-2699 v4. The performance improved by 23 percent due to Intel AVX2 and the additional cores on the Intel Xeon processor E5-2699 v4.

Note:Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to


Optimizing the CTR core module helps speed up the Baidu machine learning module. Since the Baidu machine learning module is being used in the ad platform and other Baidu applications, optimizing it will definitely improve the performance of those applications. By using the DGEMM function that comes with Intel MKL and running systems equipped with the Intel Xeon processor E5 v4 family, the performance of the CTR module was significantly improved.


  1. baidu company information
  2. Machine Learning
  3. Click-through Rate
  4. Deep Neural Networks
  5. DGEMM
  6. Intel® Math Kernel Library
  7. Intel® AVX2

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804