Intel® Developer Zone:
Server

Learn...

Find out how Intel processor technology can improve your software.

Intel® Microservers
Intel® Xeon
Intel® Xeon Phi™ Coprocessor
Intel® Cache Acceleration Software

Read...

Blogs
Technical Articles

Have something to share? Write a blog or article of your own.
Contribute today!

Grow...

Gain skills to work with our tools and guides.

Videos
Webinars

Find...

Get answers and solutions for your development questions.

Many Integrated Core Forum

Performance Forum

Videos

Deliver top application performance while minimizing development, tuning and testing time and effort.

A Concise Guide to Parallel Programming Tools for Intel® Xeon® Processors
Pick the right programming models and tools to boost your application’s performance.

Download Intel® OpenCL SDK
The first open, royalty-free standard for general-purpose parallel programming.

Parallel Studio XE 2013 is here
Powerful tools to make the most of clusters and supercomputers.

Intel® Compiler Options for Intel® SSE and Intel® AVX
Learn about the three main types of processor-specific optimizations.

Intel offers you a wide variety of capabilities to optimize your applications for performance, power, security and availability. Click on each of the buttons below to see what resources are available to you!

Intel offers a variety of Servers, Microservers, and Coprocessors to handle a variety of Cloud, Technical Computing, and Enterprise needs. Here are some resources to Compare the features from a hardware perspective.

On this page you will find information on the latest products launched by Intel, presented in a more software-centric perspective: the architecture and features, key software enabling insights, and how products are being used or configured for best performance.

Other useful resources:

Intel® Xeon® Processor E7 V2 Family Technical Overview
By Sreelekshmy Syamalakumari (Intel)Posted 02/18/20140
Download PDF Contents 1. Executive Summary 2. Introduction 3. Intel® Xeon® processor E7 V2 family enhancements   3.1 Intel® C104/102 Scalable Memory Buffer   3.2 Intel® Secure Key (DRNG)   3.3 Intel® OS Guard (SMEP)   3.4 Intel® Advanced Vector Extensions (Intel® AVX)   3.5 Advanced Pro...
Intel® Cluster Studio 2013 SP1 Update 1 Readme
By Gergana Slavova (Intel)Posted 02/12/20140
The Intel® Cluster Studio 2013 SP1 Update 1 for Linux* and Windows* combines all Intel® Composer XE and Intel® Cluster Tools into a single package. This multi-component software toolkit contains the core libraries and tools to efficiently develop, optimize, run, and distribute parallel applicatio...
Optimizing the High Frequency Trading GatiRT* Application on the latest Intel® Architecture Server
By Aditi Rathi (Intel)Posted 02/07/20140
By: Aditi Rathi (Intel) and Shailender Sharma (Gati)   ABSTRACT High frequency trading (HFT) is a form of algorithmic trading where trade is carried out in microseconds and low latencies are achieved using high-end servers and very efficient computer algorithms. In the trading world, fast is n...
Introduction to the Intel® Numeric String Conversion Library
By Zhang Z (Intel)Posted 02/04/20140
Intel® Numeric String Conversion Library (libistrconv) is a new component introduced in Intel® C++ compiler version 14.0 Update 1. This library provides a collection of routines for converting between ASCII strings of decimal numbers and C numeric data types. These routines provide similar functi...

Pages

Subscribe to
The Benefits of Solid-State Storage Technologies in the Cloud
By Thai Le (Intel)Posted 12/16/20130
Summary Solid-state drives (SSD) have rapidly evolved over the last few years, resulting in devices with more space and greater reliability. SSDs are used for caching in data centers and in larger system applications including computing massive data sets (big data: volume, variety, and velocity)...
Power Configuration Part 0: Introduction: Yikes, there is a lot that is not documented
By Taylor Kidd (Intel)Posted 12/13/20130
I was hoping to write a brief two part overview of how to configure the various power settings for the Intel® Xeon Phi™ coprocessor. It was going to be concise and brief, allowing me to get on to the next topic. Unfortunately, as I dug into the topic further, I discovered that much of it is not v...
Measure Ceph RBD performance in a quantitative way (part II)
By Jiangang (Intel)Posted 11/20/20130
This is the 2nd post about Ceph RBD performance. In part1, we go talk about random IO perforamnce on Ceph. This time we share the sequential read/write testing data. In case you forget our hardware configurations, we use 40x 1TB SATA disks for data disk plus 12 SSD as journal. And 4x 10Gb links a...
Logging and analyzing Intel© PCM output with the CSV option
By Thomas Willhalm (Intel)Posted 11/11/20130
Have you ever wanted to write the output of Intel© Performance Counter Monitor (Intel© PCM) to a file? Did you ever want to generate a graph that you can add to your report? In this blog, I walk you through how I usually do this when I use Intel© PCM. Intel© Performance Counter Monitor offers th...

Pages

Subscribe to Intel Developer Zone Blogs
Are my Parallel Studio packages updating or not?
By dnesteruk2
I've fired up the Intel Software Manager, pressed the download buttons and it all looks like this: So instead of pause buttons I get resume buttons. I've tried pressing them, they briefly turn into pause buttons. So my question: is anything being downloaded or is this thing broken? Thanks. P.S.: registration on this forum is atrocious. Finding this forum was next to impossible. The media upload thing is so far below I didn't notice it and uploaded elsewhere. Usability hint-hint!
mem address directly from SSE/AVX register
By Luchezar B.3
Hello, I would like to make a suggestion Very often [otherwise well vectorizible] algorithms require reading/writing from/to mem addresses which are calculated per-channel (reading from table, sampling a texture, etc.).When you get to this, you are forced to make that part of the algorithm scalar by extracting each channel in turn to a GP register, performing the memory operation and then inserting the result back to a vector register.I don't think a single instruction that interprets each channel as an address and reads/writes to different memory locations at once is hardware feasible (though it would be extremely good) but at least we could have something that would ease the situation. my suggestion is instructions for memory access that get the address directly from the sse/avx register: loadd $(i + (j<<4)), %xmm0, %xmm1 - read 32-bit word from address specified in the i-th dword of xmm0 and store it in j-th quarter of xmm1stored $(i + (j<<4)), %xmm0, %xmm1 - read 32-...
Studying Intel TSX Performance: strange results
By Alexander K.9
Dear all, I've made studying of Intel TSX performance - its abort cases and comparison with spin lock. The study with reference to source code is available at http://natsys-lab.blogspot.ru/2013/11/studying-intel-tsx-performance.html . I see some performance gain for TSX in comparison with spin lock. However I stll have few of questions: 1. I see huge jump of transactional aborts when transaction work set reaches 256 cache lines (Figure 1). This is 16KB which is just only a quarter of L1d cache. The workload is single threaded, running strictly on one CPU core. Also I didn't use HyperThreading. I know that the cache has 8-way associativity, however I used static memory allocations (which are continous in virtual memory and likely continous physically) and it's unlikely that I have too many collisions in cache lines to get only 16KB of available cache memory for the transaction. So how the spike can be explained? Also Figure 3 (dependency of execution time on transaction size) shows s...
Capacity planning
By aminer103
Hello, I have come to an interresting subject, if we have a distributed database and a webserver and HTML files and you want to do a capacity planning of your webserver this will complicate the things, cause the database server must be modelized as an hyperexponential distribution that is an M/G/1 queuing system , but as you have noticed since the database server system , in our network , comes before the internet connection that will be modeled as an M/M/1 queuing system, so you have to use a queuing network simulation to solve this problem , but if you have noticed, in capacity planning we have also to calculate the response time of the worst case performance, so this will easy the job for us cause in the worst case scenario since the M/G/1 queuing system of the database server have three exponential distributions for the reads and writes and deletes transactions, so we have to choose the worst service time that is exponentially distributed , so i think we have to choose only the...
Debug mic in Windows7
By Victor Z.1
 When I typed   "micnativeloadex MyTest -d 0" in Windows 7 to debug  MyTest.exe ( mic + OpenMP), I got the error  message below: Unable to create remote process. ssh to the coprocessor and run ps to verify the coi_daemon is executing. It may be necessary to restart the mpss service.      But there is no problem to run mic+openmp application.  I follow Intel debugger extersion for Intel MIC -VC2012. But it did work. How to debug it? Thank you in advance.
Parallel archiver and scalability
By aminer101
Hello, I think i am happy now, please read again... I have benchmarked parallel archiver using parallel LZMA using  5 threads on a quad core, so this have giving false results on the timing... So i have started parallel archiver with a single thread and this has giving a more accurate results, here is my correction please read again... I have come to an interresting subject, so be smart and follow with me please... I have tried to do a worst scalability prediction with an HDD hardiskfor my parallel archiver(you will find my parallel archiver here:http://pages.videotron.com/aminer/)  with Parallel LZMA, and i think it's worst than what i have thought.. there is four things in my Parallel LZMA algorithm: First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.2  second and in the compression method we have to copy a stream to the memory and this will take in average 0.05 second and in the compression method you have to compress a stream ...
I have come to an interresting subject
By aminer100
Hello, I have come to an interresting subject, so be smart and follow with me please... I have tried to do a worst scalability prediction with an HDD hardiskfor my parallel archiver(you will find my parallel archiver here:http://pages.videotron.com/aminer/)  with Parallel LZMA, and i think it's worst than what i have thought.. there is four things in my Parallel LZMA algorithm: First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.9 second and in the compression method we have to copy a stream to the memory and this will take in average 0.01 second and in the compression method you have to compress a stream to another stream in memory and this will take in average 3.1 seconds and in the compression method you have to copy a compressed stream to a hardisk file and this will take in average 0.01 second.     So we have the serial part that is: 0.9 second + 0.01 second  + 0.01 second and the parallel part will that is: 3.1 second So th...
Many-cores hit the memory wall
By aminer102
Many-cores hit the memory wall http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/ Thank you, Amine Moulay Ramdane.

Pages

Subscribe to Forums

Get More Information

Software Development Products