Intel® Developer Zone:
Server

Learn...

Find out how Intel processor technology can improve your software.

Intel® Microservers
Intel® Xeon
Intel® Xeon Phi™ Coprocessor
Intel® Cache Acceleration Software

Read...

Blogs
Technical Articles

Have something to share? Write a blog or article of your own.
Contribute today!

Grow...

Gain skills to work with our tools and guides.

Videos
Webinars

Find...

Get answers and solutions for your development questions.

Many Integrated Core Forum

Performance Forum

Videos

Deliver top application performance while minimizing development, tuning and testing time and effort.

A Concise Guide to Parallel Programming Tools for Intel® Xeon® Processors
Pick the right programming models and tools to boost your application’s performance.

Download Intel® OpenCL SDK
The first open, royalty-free standard for general-purpose parallel programming.

Parallel Studio XE 2013 is here
Powerful tools to make the most of clusters and supercomputers.

Intel® Compiler Options for Intel® SSE and Intel® AVX
Learn about the three main types of processor-specific optimizations.

Intel offers you a wide variety of capabilities to optimize your applications for performance, power, security and availability. Click on each of the buttons below to see what resources are available to you!

Intel offers a variety of Servers, Microservers, and Coprocessors to handle a variety of Cloud, Technical Computing, and Enterprise needs. Here are some resources to Compare the features from a hardware perspective.

On this page you will find information on the latest products launched by Intel, presented in a more software-centric perspective: the architecture and features, key software enabling insights, and how products are being used or configured for best performance.

Other useful resources:

Predicting and Measuring Parallel Performance
By adminPosted 02/01/20123
The success of parallelization is typically quantified by measuring the speedup of the parallel version relative to the serial version. It is also useful to compare that speedup relative to the upper limit of the potential speedup.
Scaling and Self-repair of Linux Based Services Using a Novel Distributed Computing Model Exploiting Parallelism
By Rao MikkilineniPosted 02/01/20122
Giovanni MoranaDIEEIUniversity of CataniaCatania, Italygiovanni.morana@dieei.unict.it Rao MikkilineniIEEE MemberKawa Objects Inc.,Los Altos, California, USArao@kawaobjects.com Download this white paper/sites/default/files/m/d/4/1/d/8/DIME-Network.pdf Abstract- This paper describes a prototype i...
Planning for Parallel Optimization
By Diana Byrne (Intel)Posted 01/27/20120
Download this whitepaper Optimizing your application for multi-core technology is fast becoming a requirement: multi-core computers have become mainstream, making up 83% of PC shipments in 2010. And the number of cores is increasing, with 60% of shipments projected to have 4 or more cores in 2012...
Intel® 64 Architecture Processor Topology Enumeration
By Shih Kuo (Intel)Posted 01/27/201248
  Processor topology information is important for a number of processor-resource management practices, ranging from task/thread scheduling, licensing policy enforcement, affinity control/migration, etc. Topology information of the cache hierarchy can be important to optimizing software performan...

Pages

Subscribe to
No content found

Pages

Subscribe to Intel Developer Zone Blogs
Are my Parallel Studio packages updating or not?
By dnesteruk2
I've fired up the Intel Software Manager, pressed the download buttons and it all looks like this: So instead of pause buttons I get resume buttons. I've tried pressing them, they briefly turn into pause buttons. So my question: is anything being downloaded or is this thing broken? Thanks. P.S.: registration on this forum is atrocious. Finding this forum was next to impossible. The media upload thing is so far below I didn't notice it and uploaded elsewhere. Usability hint-hint!
mem address directly from SSE/AVX register
By Luchezar B.3
Hello, I would like to make a suggestion Very often [otherwise well vectorizible] algorithms require reading/writing from/to mem addresses which are calculated per-channel (reading from table, sampling a texture, etc.).When you get to this, you are forced to make that part of the algorithm scalar by extracting each channel in turn to a GP register, performing the memory operation and then inserting the result back to a vector register.I don't think a single instruction that interprets each channel as an address and reads/writes to different memory locations at once is hardware feasible (though it would be extremely good) but at least we could have something that would ease the situation. my suggestion is instructions for memory access that get the address directly from the sse/avx register: loadd $(i + (j<<4)), %xmm0, %xmm1 - read 32-bit word from address specified in the i-th dword of xmm0 and store it in j-th quarter of xmm1stored $(i + (j<<4)), %xmm0, %xmm1 - read 32-...
Studying Intel TSX Performance: strange results
By Alexander K.9
Dear all, I've made studying of Intel TSX performance - its abort cases and comparison with spin lock. The study with reference to source code is available at http://natsys-lab.blogspot.ru/2013/11/studying-intel-tsx-performance.html . I see some performance gain for TSX in comparison with spin lock. However I stll have few of questions: 1. I see huge jump of transactional aborts when transaction work set reaches 256 cache lines (Figure 1). This is 16KB which is just only a quarter of L1d cache. The workload is single threaded, running strictly on one CPU core. Also I didn't use HyperThreading. I know that the cache has 8-way associativity, however I used static memory allocations (which are continous in virtual memory and likely continous physically) and it's unlikely that I have too many collisions in cache lines to get only 16KB of available cache memory for the transaction. So how the spike can be explained? Also Figure 3 (dependency of execution time on transaction size) shows s...
Capacity planning
By aminer103
Hello, I have come to an interresting subject, if we have a distributed database and a webserver and HTML files and you want to do a capacity planning of your webserver this will complicate the things, cause the database server must be modelized as an hyperexponential distribution that is an M/G/1 queuing system , but as you have noticed since the database server system , in our network , comes before the internet connection that will be modeled as an M/M/1 queuing system, so you have to use a queuing network simulation to solve this problem , but if you have noticed, in capacity planning we have also to calculate the response time of the worst case performance, so this will easy the job for us cause in the worst case scenario since the M/G/1 queuing system of the database server have three exponential distributions for the reads and writes and deletes transactions, so we have to choose the worst service time that is exponentially distributed , so i think we have to choose only the...
Debug mic in Windows7
By Victor Z.1
 When I typed   "micnativeloadex MyTest -d 0" in Windows 7 to debug  MyTest.exe ( mic + OpenMP), I got the error  message below: Unable to create remote process. ssh to the coprocessor and run ps to verify the coi_daemon is executing. It may be necessary to restart the mpss service.      But there is no problem to run mic+openmp application.  I follow Intel debugger extersion for Intel MIC -VC2012. But it did work. How to debug it? Thank you in advance.
Parallel archiver and scalability
By aminer101
Hello, I think i am happy now, please read again... I have benchmarked parallel archiver using parallel LZMA using  5 threads on a quad core, so this have giving false results on the timing... So i have started parallel archiver with a single thread and this has giving a more accurate results, here is my correction please read again... I have come to an interresting subject, so be smart and follow with me please... I have tried to do a worst scalability prediction with an HDD hardiskfor my parallel archiver(you will find my parallel archiver here:http://pages.videotron.com/aminer/)  with Parallel LZMA, and i think it's worst than what i have thought.. there is four things in my Parallel LZMA algorithm: First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.2  second and in the compression method we have to copy a stream to the memory and this will take in average 0.05 second and in the compression method you have to compress a stream ...
I have come to an interresting subject
By aminer100
Hello, I have come to an interresting subject, so be smart and follow with me please... I have tried to do a worst scalability prediction with an HDD hardiskfor my parallel archiver(you will find my parallel archiver here:http://pages.videotron.com/aminer/)  with Parallel LZMA, and i think it's worst than what i have thought.. there is four things in my Parallel LZMA algorithm: First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.9 second and in the compression method we have to copy a stream to the memory and this will take in average 0.01 second and in the compression method you have to compress a stream to another stream in memory and this will take in average 3.1 seconds and in the compression method you have to copy a compressed stream to a hardisk file and this will take in average 0.01 second.     So we have the serial part that is: 0.9 second + 0.01 second  + 0.01 second and the parallel part will that is: 3.1 second So th...
Many-cores hit the memory wall
By aminer102
Many-cores hit the memory wall http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/ Thank you, Amine Moulay Ramdane.

Pages

Subscribe to Forums

Get More Information

Software Development Products