Intel® Developer Zone:


Erfahren Sie, wie Intel Prozessortechnologie Ihre Software verbessern kann.

Intel® Mikroserver
Intel® Xeon
Intel® Xeon Phi™ Coprozessor
Intel® Cache Acceleration Software


Technische Beiträge

Sie möchten sich mitteilen? Schreiben Sie einen eigenen Blog oder Beitrag.
Machen Sie mit!


Lernen Sie den Umgang mit unseren Tools und Leitfäden.



Erhalten Sie Antworten auf Fragen und Lösungen für Ihre Entwicklungsarbeit.

Many Integrated Core Forum

Performance Forum


Entwickeln Sie Anwendungen mit Top-Leistung und reduzieren Sie gleichzeitig den Aufwand für Entwicklung, Tuning und Testläufe.

Eine umfassende Anleitung zu Parallel Programming Tools für Intel® Xeon® Prozessoren
Auswahl der richtigen Programmiermodelle und -Tools für maximale Anwendungsleistung.

Laden Sie das Intel® OpenCL SDK herunter
Der erste offene, gebührenfreie Standard für universelles Parallel Programming.

Parallel Studio XE 2013 ist da
Power-Tools, mit denen sich das meiste aus Clustern und Supercomputern rausholen lässt.

Intel® Compiler-Optionen für Intel® SSE und Intel® AVX
Informieren Sie sich über die drei wichtigsten prozessorspezifischen Optimierungen.

Intel kann Ihnen auf vielfache Weise helfen, Ihre Anwendungen in Sachen Leistung, Stromverhalten, Sicherheit und Verfügbarkeit zu verbessern. Klicken Sie auf die Schaltflächen unten, um sich über die verschiedenen Ressourcen zu informieren!

Intel hat verschiedene Server, Mikroserver und Coprozessoren im Angebot, die für die verschiedensten Cloud-Computing-, Technical-Computing- und Unternehmensanforderungen geeignet sind. Hier finden Sie einige Ressourcen zum Vergleich der Leistungsmerkmale aus der Hardwareperspektive.

Auf dieser Seite finden Sie Informationen zu den neuesten Produkteinführungen von Intel aus der Perspektive der Software: Architektur und Leistungsmerkmale, wichtige Erkenntnisse zum Thema Software-Enabling sowie Verwendung und Konfiguration der Produkte für optimale Leistung.

Weitere nützliche Ressourcen:

  • Schätzen Sie die Leistung der Produkte von Intel anhand gängiger Branchen-Benchmarks ein. Siehe Abschnitt „Serverleistung“ unter
  • Intel Cluster Ready: Durch dieses in Zusammenarbeit mit Hardware- und Softwareherstellern entwickelte Leistungsmerkmal können Sie Ihre HPC-Anwendungen auf die führenden Plattformen und Komponenten von heute abstimmen
  • Rechenzentrum-Design für Cloud-Computing: Finden Sie heraus, wie sich Intel die Zukunft des Cloud-Computing vorstellt, und mit welchen Technologien diese Zukunft Wirklichkeit werden kann.
  • High-Performance-Computing: Erfahren Sie, wie neue HPC-Lösungen von Intel intelligente Leistung für die komplexesten HPC-Herausforderungen von heute bieten.
Video: Big Data Technologies Part 2
By DANIEL F. (Intel)Posted 03/28/20140
Intel software engineering experts discuss big data technologies and how Intel improves the performance and capabilities of Java, Hadoop, and NoSQL data stores.  Download Part 2 of the 4-part series.
Intel® Trusted Execution Technology (Intel® TXT) Enabling Guide
By David Mulnix (Intel)Posted 03/28/20140
Download as PDF Contents   1 Overview of Benefits from Intel® Trusted Execution Technology (Intel® TXT) 2 Hardware and Software Prerequisites 2.1 Hardware-Layer Requirements 2.1.1 Processor 2.1.2 Chipset 2.1.3 BIOS 2.2 Software-Layer Requirements 2.2.1 Operating System and Hyper…
Resource Guide for Intel® Xeon Phi™ Coprocessor Administrators
By Taylor Kidd (Intel)Posted 03/25/20140
This article makes recommendations for how an administrator can get up to speed quickly on the Intel® Many Integrated Core (Intel® MIC) Architecture. This article is 1 of 3: For the Administrator, for the Developer, and for the Investigator. Someone who will administer and support a set of machine…
Resource Guide for Intel® Xeon Phi™ Coprocessor Developers
By Taylor Kidd (Intel)Posted 03/25/20143
This article makes recommendations for how a developer can get up to speed quickly on the Intel® Many Integrated Core (Intel® MIC) Architecture. This is one of three articles: For the Administrator, for the Developer, and for the Investigator. Who is a Developer? Someone who will be programming on…


Kein Inhalt gefunden
Intel Developer Zone Blogs abonnieren
Effeciently parallelizing the code in fortran
By prodigyaj@gmail.com13
Hi, I am trying to parallelize a certain section of my code which is written in fortran. The code snippet looks as below: do i=1,array(K) j = K ... if conditions on K... ....write and reads on j... ... do lot of things ... K = K+1 So I tried to parallelize using the below code.. which was obviously not as it should have been !$OMP PARALLEL DO PRIVATE(j) do i=1,50 j = K ... if conditions on K... ....write and reads on j... ... do lot of things ... K = K+1 !$OMP END PARALLEL DO   The obvious mistake being, all the threads race to the same K. What would be the best way to ensure every thread gets assigned an incremental K and the threads run in parallel. Thanks Ajay
Different ways to turn an AoS into an SoA
By Diego Caballero6
Hi, I'm trying to implement a permutation that turns an AoS (where the structure has 4 float) into a SoA, using SSE, AVX, AVX2 and KNC, and without using gather operations, to find out if it worth it. For example, using KNC, I would like to use 4 zmm registers: {A0, A1, ... A15} {B0, B1, ... B15} {C0, C1, ... C15} {D0, D1, ... D15} to end up having something like: {A0, A4, A8, A12, B0, B4, B8, B12, C0, C4, C8, C12, D0, D4, D8, D12} {A1, A5, A9, ...} {A2, A6, A10, ...} {A3, A7, A11, ...} Since the permutation instructions are significantly changing among architectures and I wouldn't like to reinvent the wheel, I would be glad if someone could point me where to find information about this, or share their knowledge.   Thank you in advance.
How to clear the upper 128 bits of __m256 value?
By Vladimir Sedach8
How can I clear the upper 128 bits of m2: __m256i    m2 = _mm256_set1_epi32(2); __m128i    m1 = _mm_set1_epi32(1); m2 = _mm256_castsi128_si256(_mm256_castsi256_si128(m2)); m2 = _mm256_castsi128_si256(m1); don't work -- Intel’s documentation for the _mm256_castsi128_si256 intrinsic says that “the upper bits of the resulting vector are undefined”. At the same time I can easily do it in assembly: VMOVDQA xmm2, xmm2 VMOVDQA xmm2, xmm1 Of cause I'd not like to use _mm256_insertf128_si256().  
Get _mm_alignr_epi8 functionality on 256-bit vector registers (AVX2)
By Diego Caballero15
Hello, I'm porting an application from SSE to AVX2 and KNC. I have some _mm_alignr_epi8 intrinsics. While I just had to replace this intrinsic by the _mm512_alignr_epi32 intrinsic for KNC (by the way, I missed this intrinsic in for KNC), it seems that the 256-bit version, _mm256_alignr_epi8 does something unexpected. It is not an extension of the previous 128-bit instruction to 256 bits. It performs a 2x128-bit alignr on 256-bit vectors, which is not the expected behaviour if we look at its counterparts in AVX512 and KNC. Does someone know the most efficient way of implementing the extension of _mm_alignr_epi8 to 256-bit vectors using AVX2 intrinsics? I.e., being V1={7, 6, 5, 4, 3, 2, 1, 0} and V2={15, 14, 13, 12, 11, 10, 9, 8}, the output of this operation should be V3{8, 7, 6, 5, 4, 3, 2 ,1} and not V3{12, 7, 6, 5, 8, 3, 2 ,1}, which is what I get using _mm256_alignr_epi8. Thank you in advance    
FMA manipulation of register’s content for XMM, YMM and ZMM register sets
By Mile M.1
hello, there wasn’t a typical introduction thread so since it’s my first post i though to introduce myself. my name is mile (yes like the measuring unit) and i’m a student. i’m noob in this area. i’m writing a paper for school and before posting my question(s) here i’ve thoroughly researched for an answer online to the best of my abilities but i didn’t managed to find one. after browsing the forum i’ve decided to post in new topic instead going off topic in another one. during my research some things cleared up to me but i still couldn’t find clearly defined answers. i’ll probably have some trivial questions and silly assumptions so please correct me if i’m wrong or if i’m missing something. i’ve spent a lot of time drawing the diagrams and writing со if someone can help i would appreciate it greatly! i’m having a mentor studies where we don’t have lectures but isnread the professor gives us a topic which we study on our own. unfortunately i can’t bother my professor with this type of…
Shared memory on Xeon
By Madhav A.1
Hi,       Here is an observation I have. Can you help me explain it.      Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.      Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)      What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it? Thanks, Madhav.
gather instructions and the size of indexs for a given base gpr size
By perfwise5
Hi,     I have a simple question.  When performing address computations, the size of the BASE and the INDEX are required to be the same.  I presumed this was the case in the GATHER instructions.. but I don't believe it is so now.  Can someone confirm?  Namely.. I'm asking if you can use a 64-bit gpr BASE register, and use 32-bit indexes in an instruction like VGATHERDPS or VPGATHERDD.  In these 2 instructions the indexes are 32-bit values, which I presume are sign extended to 64-bits when you have a 64-bit gpr BASE specified.  I didn't find it clearly stated this was possible nor did I find it was prohibited.. so just wanted to clarify.   Thank you for any helpful and concise feedback perfwise
ICPC 13.0.2 generates scalar load instead of packed load
By Paul S.0
Hi all, I'm a little puzzled about the generated assembly code for this little piece of Cilk code: void gemv(const float* restrict A[4], const float *restrict x, float * restrict y){     __assume_aligned(y, 32);     __assume_aligned(x, 32);     __assume_aligned(A, 32);     y[0:4]  = A[0:4][0] * x[0];     y[0:4] += A[0:4][1] * x[1];     y[0:4] += A[0:4][2] * x[2];     y[0:4] += A[0:4][3] * x[3]; } Looking at the generated assembly code: - The compiler changes the algorithm such that it uses the vdpps instruction (most likely due to the bad access pattern of A).  | - Loads for A are okay (only four packed loads). However, the loads and stores for x and y are quite bad. The compiler issues four scalar loads/ stores for both x and y. More precisely, here is a sequence of the generated scalar loads for x: vmovss    xmm0, DWORD PTR [rsi]                          vmovss    xmm1, DWORD PTR [4+rsi]                        vmovss    xmm2, DWORD PTR [8+rsi]                        vmov…


Foren abonnieren


Support anfordern

Produkte für die Software-Entwicklung