Intel® Developer Zone:
Server

Learn...

Find out how Intel processor technology can improve your software.

Intel® Microservers
Intel® Xeon
Intel® Xeon Phi™ Coprocessor
Intel® Cache Acceleration Software

Read...

Blogs
Technical Articles

Have something to share? Write a blog or article of your own.
Contribute today!

Grow...

Gain skills to work with our tools and guides.

Videos
Webinars

Find...

Get answers and solutions for your development questions.

Many Integrated Core Forum

Performance Forum

Videos

Deliver top application performance while minimizing development, tuning and testing time and effort.

A Concise Guide to Parallel Programming Tools for Intel® Xeon® Processors
Pick the right programming models and tools to boost your application’s performance.

Download Intel® OpenCL SDK
The first open, royalty-free standard for general-purpose parallel programming.

Parallel Studio XE 2013 is here
Powerful tools to make the most of clusters and supercomputers.

Intel® Compiler Options for Intel® SSE and Intel® AVX
Learn about the three main types of processor-specific optimizations.

Intel offers you a wide variety of capabilities to optimize your applications for performance, power, security and availability. Click on each of the buttons below to see what resources are available to you!

Intel offers a variety of Servers, Microservers, and Coprocessors to handle a variety of Cloud, Technical Computing, and Enterprise needs. Here are some resources to Compare the features from a hardware perspective.

On this page you will find information on the latest products launched by Intel, presented in a more software-centric perspective: the architecture and features, key software enabling insights, and how products are being used or configured for best performance.

Other useful resources:

Optimizing Hadoop Deployments
By DANIEL F. (Intel)Posted 12/24/20130
This paper provides guidance, based on extensive lab testing conducted at Intel, to help IT organizations plan an optimized infrastructure for deploying Apache Hadoop*.  It includes: Best practices for establishing server hardware specifications level software guidance regarding the operating s...
Intel® Fortran Vectorization Diagnostics
By Ronald W Green (Intel)Posted 12/24/20130
Intel® Fortran Compiler Vectorization Diagnostics   We have a similar catalog of vectorization diagnostics for the Intel® C++ Compiler HERE   The following diagnostic messages from the vectorization report produced by the Intel® Fortran Compiler.  To obtain a vectorization report, use the -ve...
Optimizing Infrastructure for Workloads in OpenStack-Based Public Cloud Services
By DANIEL F. (Intel)Posted 12/20/20130
This paper examines how business needs translate to infrastructure considerations for infrastructure-as-a-service (IaaS) when building out or enhancing an OpenStack* cloud environment. The paper looks at these requirements and the foundational platform technologies that can support a wide range o...
Accelerating Performance for Server-Side Java* Applications
By DANIEL F. (Intel)Posted 12/20/20130
This paper describes the key architectural advancements of the latest Intel Xeon processors and Intel Atom processor C2000s that are beneficial to Java applications. It also describes some of the techniques and strategies used to optimize JVM software and the benefits those optimizations bring ...

Pages

Subscribe to
Intel® Xeon Phi™ Coprocessor Training Material Updated
By kathy-farrel (Intel)Posted 11/06/20130
When the Intel® Xeon Phi™ Coprocessor was released, training videos were published. These videos are now complemented by revised training presentations as follows: Intel® Xeon Phi™ Coprocessor Introduction Intel® Xeon Phi™ Coprocessor Architecture Overview Intel® Xeon Phi™ Coprocessor Softwar...
What’s New in Intel® Composer XE 2013 SP1
By loc-nguyen (Intel)Posted 11/05/20130
Intel® Composer XE 2013 SP1 includes Intel® Compiler 14.0 among other components. The list below summarizes the features and enhancement highlights in Intel® Compiler 14.0 that are pertinent to those programming for Intel® Xeon Phi™ coprocessors: ·       Support for the new Intel® Xeon Phi™ Copro...
Applying Intel® Threading Building Blocks observers for thread affinity on Intel® Xeon Phi™ coprocessors.
By Alexei Katranov (Intel)Posted 10/31/20131
In spite of the fact that the Intel® Threading Building Blocks (Intel® TBB) library [1] [2] provides high-level task based parallelism intended to hide software thread management, sometimes thread related problems arise. One of these problems is thread affinity [3]. Since thread affinity may help...
Intel® Xeon Phi™ coprocessor Power Management Turbo Part 2: Hot and Cold Running Silicon
By Taylor Kidd (Intel)Posted 10/22/20130
The previous blog in this series, “Intel® Xeon Phi™ coprocessor Power Management Turbo Part 1: What is turbo? And how will it affect my horsepower?” can be found at http://software.intel.com/en-us/blogs/2013/09/26/intel-xeon-phi-coprocessor-power-management-turbo-part-1-what-is-turbo-and-how-will...

Pages

Subscribe to Intel Developer Zone Blogs
mem address directly from SSE/AVX register
By Luchezar B.3
Hello, I would like to make a suggestion Very often [otherwise well vectorizible] algorithms require reading/writing from/to mem addresses which are calculated per-channel (reading from table, sampling a texture, etc.).When you get to this, you are forced to make that part of the algorithm scalar by extracting each channel in turn to a GP register, performing the memory operation and then inserting the result back to a vector register.I don't think a single instruction that interprets each channel as an address and reads/writes to different memory locations at once is hardware feasible (though it would be extremely good) but at least we could have something that would ease the situation. my suggestion is instructions for memory access that get the address directly from the sse/avx register: loadd $(i + (j<<4)), %xmm0, %xmm1 - read 32-bit word from address specified in the i-th dword of xmm0 and store it in j-th quarter of xmm1stored $(i + (j<<4)), %xmm0, %xmm1 - read 32-...
Studying Intel TSX Performance: strange results
By Alexander K.9
Dear all, I've made studying of Intel TSX performance - its abort cases and comparison with spin lock. The study with reference to source code is available at http://natsys-lab.blogspot.ru/2013/11/studying-intel-tsx-performance.html . I see some performance gain for TSX in comparison with spin lock. However I stll have few of questions: 1. I see huge jump of transactional aborts when transaction work set reaches 256 cache lines (Figure 1). This is 16KB which is just only a quarter of L1d cache. The workload is single threaded, running strictly on one CPU core. Also I didn't use HyperThreading. I know that the cache has 8-way associativity, however I used static memory allocations (which are continous in virtual memory and likely continous physically) and it's unlikely that I have too many collisions in cache lines to get only 16KB of available cache memory for the transaction. So how the spike can be explained? Also Figure 3 (dependency of execution time on transaction size) shows s...
Capacity planning
By aminer103
Hello, I have come to an interresting subject, if we have a distributed database and a webserver and HTML files and you want to do a capacity planning of your webserver this will complicate the things, cause the database server must be modelized as an hyperexponential distribution that is an M/G/1 queuing system , but as you have noticed since the database server system , in our network , comes before the internet connection that will be modeled as an M/M/1 queuing system, so you have to use a queuing network simulation to solve this problem , but if you have noticed, in capacity planning we have also to calculate the response time of the worst case performance, so this will easy the job for us cause in the worst case scenario since the M/G/1 queuing system of the database server have three exponential distributions for the reads and writes and deletes transactions, so we have to choose the worst service time that is exponentially distributed , so i think we have to choose only the...
Debug mic in Windows7
By Victor Z.1
 When I typed   "micnativeloadex MyTest -d 0" in Windows 7 to debug  MyTest.exe ( mic + OpenMP), I got the error  message below: Unable to create remote process. ssh to the coprocessor and run ps to verify the coi_daemon is executing. It may be necessary to restart the mpss service.      But there is no problem to run mic+openmp application.  I follow Intel debugger extersion for Intel MIC -VC2012. But it did work. How to debug it? Thank you in advance.
Parallel archiver and scalability
By aminer101
Hello, I think i am happy now, please read again... I have benchmarked parallel archiver using parallel LZMA using  5 threads on a quad core, so this have giving false results on the timing... So i have started parallel archiver with a single thread and this has giving a more accurate results, here is my correction please read again... I have come to an interresting subject, so be smart and follow with me please... I have tried to do a worst scalability prediction with an HDD hardiskfor my parallel archiver(you will find my parallel archiver here:http://pages.videotron.com/aminer/)  with Parallel LZMA, and i think it's worst than what i have thought.. there is four things in my Parallel LZMA algorithm: First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.2  second and in the compression method we have to copy a stream to the memory and this will take in average 0.05 second and in the compression method you have to compress a stream ...
I have come to an interresting subject
By aminer100
Hello, I have come to an interresting subject, so be smart and follow with me please... I have tried to do a worst scalability prediction with an HDD hardiskfor my parallel archiver(you will find my parallel archiver here:http://pages.videotron.com/aminer/)  with Parallel LZMA, and i think it's worst than what i have thought.. there is four things in my Parallel LZMA algorithm: First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.9 second and in the compression method we have to copy a stream to the memory and this will take in average 0.01 second and in the compression method you have to compress a stream to another stream in memory and this will take in average 3.1 seconds and in the compression method you have to copy a compressed stream to a hardisk file and this will take in average 0.01 second.     So we have the serial part that is: 0.9 second + 0.01 second  + 0.01 second and the parallel part will that is: 3.1 second So th...
Many-cores hit the memory wall
By aminer102
Many-cores hit the memory wall http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/ Thank you, Amine Moulay Ramdane.
I have a question for you
By aminer101
Hello, I have a question for you: What is actually the technics that are used in the hardwareto scale and speed more the memory system and the harddisk system ? andif those technics to speed the memory system and harddisk systemcan make memory and/or disk bound applications scalable in todayand future multicore systems ? Thank you,Amine Moulay Ramdane

Pages

Subscribe to Forums

Get More Information

Software Development Products