A simple example to measure the performance of an Intel® MKL function
The time required by the first MKL call should be ignored for the perfromance measurements. The first MKL call has overhead due to buffer allocation and thread initialization. Ignoring the first MKL call gives more consistent times for small problems. Type: Technical Article,Code |
MKL intel mkl GEMM BLAS matrix multiplication small matrix Intel MKL Performance |
04/27/2012
|
Intel® Performance Counter Monitor - A better way to measure CPU utilization
The Intel® Performance Counter Monitor provides sample C++ routines and utilities to estimate the internal resource utilization of the latest Intel® Xeon® and Core™ processors and gain a significant performance boost. Type: Technical Article,Download,News,Tutorials |
monitoring Intel Performance Counter Monitor simultaneous multithreading out-of-order execution Intel® Performance Counter Monitor Intel® Xeon® Core™ processors multi-level caches pipelining |
04/13/2012
|
HowTo – HPL Over Intel MPI
This is a step by step procedure of how to run the High Performance Linpack (HPL)benchmark on a Linux cluster using Intel-MPI. This was done on a Linux cluster of 128 nodes running Intel’s Nehalem processor 2.93 MHz with 12GB of RAM on each node. Type: Technical Article,Download |
|
08/08/2011
|
Compile and run MPIBLAST 1.6.0 in the Intel(R) Cluster Ready Reference Design S5520UR-ICR1.1-ROCKS5.3-CENTOS5.4-C2 v1.0
Prerequisites
You need to have deployed the latest Intel(R) Cluster Ready Reference Design S5520UR-ICR1.1-ROCKS5.3-CENTOS5.4-C2 v1.0.This reference design targets the next components:
Intel® Xeon® ... Type: Technical Article,Tutorials |
|
07/07/2011
|
Intel® Compiler Options for Intel® SSE and Intel® AVX generation (SSE2, SSE3, SSE3_ATOM, SSSE3, SSE4.1, SSE4.2, AVX, AVX2) and processor-specific optimizations
Explains which Intel® Compiler switches to use to target and optimize for a specific platform, microarchitecture, CPU or processor. Type: Technical Article,Code |
dual-core xeon pentium SSE2 SSE3 SSE Core 2 Duo SSE4.2 SSSE3 SSE4.1 MMX Core 2 Quad atom Core i7 compiler AVX vcsource_domain_media vcsource_os_windows vcsource_platform_desktoplaptop vcsource_domain_graphics vcsource_product_icc vcsource_index |
09/02/2010
|
Superscalar programming 101 (Matrix Multiply) Part 4 of 5
In the last installment (Part 3) we saw the effects of the QuickThread Parallel Tag Team method of Matrix Multiplication performed on two single processor systems:
[image]
Where the Intel Q6600 (4 co ... Type: Technical Article |
|
08/27/2010
|
Superscalar programming 101 (Matrix Multiply) Part 5 of 5
In part 4 we saw the effects of the QuickThread Parallel Tag Team Transpose method of Matrix Multiplication performed on a Dual Xeon 5570 systems with 2 sockets and two L3 caches, each shared by four c ... Type: Technical Article |
|
08/25/2010
|
Improving Medical Imaging Performance on the Intel® Xeon® Processor 5500 series
In Medical Imaging, it is important to maximize healthcare quality by providing the best images in the shortest time to assure accurate diagnosis & patient treatment. This article describes the 50x speedup of an image reconstruction algorithm. Type: Technical Article |
Vtune Xeon 5500 visual computing medical imaging image reconstruction MRI SPECT vcsource_type_techarticle vcsource_domain_media vcsource_os_windows vcsource_platform_desktoplaptop vcsource_product_vtunexe vcsource_product_icc vcsource_os_linux vcsource_index |
07/12/2010
|
The Cost Benefit Case for Database Migration to Intel Servers
Although server consolidation has become pervasive, to date it has been more commonly applied to application and infrastructure servers, rather than database servers. This report examines the cost savings by upgrading and consolidating IBM DB2 databases. Type: Technical Article |
xeon Xeon 5500 xeon 5500 series DB2 database |
11/11/2009
|
An evaluation of the impact of memory configuration on the performance of applications running on Intel® Xeon® processor 5500-series based servers
Optimizing memory configurations of servers using the 5500 series Intel® Xeon® processors is important to optimize bandwidth from the three-channel memory controller. This paper provides information on various memory configurations on 16 HPC applications. Type: Technical Article |
|
10/28/2009
|
Java Application Server Optimization for Multi-core Systems
This paper examines the performance characteristics of Java application servers running on 32-bit and 64-bit Java Virtual Machines (JVM) and operating systems on the latest architectures and platforms available today. Type: Technical Article |
|
10/07/2009
|
"Vectorization: Writing C/C++ code in VECTOR Format"
Vectorization: Writing C/C++ code in VECTOR FormatMukkaysh SrivastavComputational Research Laboratories (CRL) - Pune, India
1.0 Introduction: Vectorization has been key optimization principle over ... Type: Technical Article |
|
10/06/2009
|
Running The HPL Benchmark Over Intel MPI
This is a step by step procedure on how to run the High Performance Linpack (HPL) benchmark on a Linux cluster using Intel-MPI. This was done on a Linux cluster of 128 nodes running Intel’s Nehalem processor 2.93 MHz with 12GB of RAM on each node. Type: Technical Article |
High Performance Linpack HPL Nehalem MKL Intel MPI Compiler GFLOPS Mohamad Sindi |
09/28/2009
|
IBM DB2 9.7 on Intel Xeon Processor 5500 Series
Are you spending too much on your database?
PROGRAMMING: Automated tuning features in DB2 9.7 outperformed expert IBM engineers.
PERFORMANCE: Intel Xeon processor 5500 series delivers 9x the perfor ... |
database xeon performance |
09/17/2009
|
DB2* 9 pureXML* Scalability on Intel® Xeon® MP Platforms Using IBM N Series* Storage
Introduction
With the recent launch of the next-generation Intel® Xeon® processor MP, IBM DB2* 9, and IBM N Series storage, businesses can now enjoy the rich processing power and performance benefits ... |
|
08/17/2009
|
Prana Studios leverages Intel® Xeon® Processor 5500 Series to get better 3D animation rendering
Introduction: Prana Studios is a leading Animation house based out of Mumbai and Los Angeles. Prana's core business is focused on four main areas: Long-form CG content, location based entertainment, ga ... Type: Technical Article |
xeon India Case Study Prana Studios Xeon case study vcsource_os_windows vcsource_type_news vcsource_domain_graphics vcsource_index |
07/08/2009
|
High Clocks Per Instruction Retired when vectorizing the loop.
Sometimes when we vectorize a loop, we get a high Clocks Per Instruction Retired (CPI) value. This happens when there is high bus utilization and the bus gets saturated. Type: Technical Article,Code |
simd SSE2 SSE3 SSE4 SSE High CPI Vectorizer hardware prefetcher SSE1 Memoray latency BUS Saturation Vtune |
11/18/2008
|
Porting Code to Intel® EM64T-Based Platforms
by Robert Y. GevaPrincipal Engineer, Intel Software and Solutions Group
Introduction
Porting code from IA-32 architecture to EM64T to take advantage of 64-bit involves tradeoffs in performance consid ... |
EM64T SSE3 |
10/24/2008
|
Porting Chains: 64-bit Intel® Xeon® Processor to Intel® Itanium® Processor
by Alan Zeichick
Introduction
Read the tale of two migrations: vertically to the Intel® Xeon® processor with Intel® Extended Memory 64 Technology, then horizontally to Intel® Itanium® microarchi ... |
EM64T Porting |
10/23/2008
|
Demystifying the Enterprise Choices and Scaling Hardware for Oracle*-Based Apps
by Matt Gillespie
Introduction
The 64-bit Intel® Xeon® processor extends the choices for organizations developing and deploying enterprise applications based on Oracle Database* 10g. This archit ... |
|
10/23/2008
|
Developing for Speed: A Four-Step Approach
by George Walsh
Introduction
There's really no denying that application optimization yields performance benefits. The question in each case is whether time spent optimizing and resulting perfor ... Type: Technical Article |
|
10/20/2008
|
Microsoft .NET* on Intel® Xeon™ Processor
by Thomas E. Martinez, Intel Corporation
Introduction
This article identifies the features (many of which are new) in the Intel® Xeon™ processor that make it an effective and efficient platform f ... |
.net |
10/17/2008
|
Cross Intel® Architecture Development Tools
Introduction
Recent years have yielded an amazing number of new operating systems, new processors, and new platform capabilities that provide exciting opportunities for application developers. Con ... |
|
10/17/2008
|
Intel® Xeon® Processors and Itanium® 2 Processors - Picking the Right
by Roger Smith
Introduction
Software developers have two powerful allies in their quest for optimal applications performance: Explicitly Parallel Instruction Computing (EPIC) and Hyper-Threading ... |
|
10/17/2008
|
Introducing Intel® NetBurst® MicroArchitecture Optimization
Introduction
A Deeper Pipeline and New Cache Structure
The Intel NetBurst® microarchitecture is a new feature from Intel that was introduced in the Pentium® 4 and Intel® Xeon™ processors. Althoug ... |
netburst |
10/17/2008
|