Intel® Developer Zone:
Performance

Highlights

Frisch aus der Presse! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Eignen Sie sich die Grundlagen der Programmierung für diese neue Architektur und neuen Produkte an. Neu!
Intel® System Studio
Das Intel® System Studio ist eine umfassende, integrierte Tool-Suite-Lösung für die Software-Entwicklung. Damit können Sie die Vorlaufzeit verkürzen sowie Systeme zuverlässiger, energieeffizienter und leistungsfähiger machen.
Neu!
Für alle, die es verpasst haben – Aufzeichnung des zweitägigen Live-Webinars
Einführung zur Entwicklung von High-Performance-Anwendungen für Intel® Xeon und Intel® Xeon Phi™ Coprozessoren.
Structured Parallel Programming
Die Autoren Michael McCool, Arch D. Robison und James Reinders machen das Thema über strukturierte Muster für jeden Software-Entwickler zugänglich.

Bieten Sie Ihren Kunden bestmögliche Anwendungen an, durch Parallel Programming mithilfe der innovativen Ressourcen von Intel.

Entwicklungsressourcen


Entwicklungs-Tools

 

Intel® Parallel Studio

Intel® Parallel Studio bietet vereinfachte, durchgehende Parallelität für Microsoft Visual Studio* C/C++ Entwickler mit ausgeklügelten Tools zur Optimierung von Clientanwendungen für Multicore und Manycore.

Intel® Produkte für die Software-Entwicklung

Erkunden Sie alle Tools, die Ihnen bei der Optimierung für die Intel Architektur helfen können. Bestimmte Tools können 45 Tage lang kostenlos ausprobiert werden.

Tools-Wissensdatenbank

Anleitungen und Supportinformationen für Intel Tools.

Intel Cluster Ready FAQ: Hardware vendors, system integrators, platform suppliers
Von Werner Krotz-vogel (Intel)Veröffentlicht am 03/23/20150
Q: Why should we join the Intel® Cluster Ready program? A: By offering certified Intel Cluster Ready systems and certified components, you can give customers greater confidence in deploying and running HPC systems. Participating in the program will help you drive HPC adoption, expand your customer…
Intel Cluster Ready FAQ: Customer benefits
Von Werner Krotz-vogel (Intel)Veröffentlicht am 03/23/20150
Q: Why should we select a certified Intel Cluster Ready system and registered Intel Cluster Ready applications?A: Choosing certified systems and registered applications gives you the confidence that your cluster will work as it should, right away, so you can boost productivity and start solving new…
Dynamic allocator replacement on OS X* with Intel® TBB
Von Kirill Rogozhin (Intel)Veröffentlicht am 03/23/20150
The Intel® Threading Building Blocks (Intel® TBB) library provides an alternative way to dynamically allocate memory - Intel TBB scalable allocator (tbbmalloc). Its purpose is to provide better performance and scalability for memory allocation/deallocation operations in multithreaded applications, …
Courseware Algorithmic Strategies
Von adminVeröffentlicht am 02/27/20150
Brute-force algorithms Greedy algorithms Divide-and-conquer Backtracking Branch-and-bound Heuristics Pattern matching and string/text algorithms Numerical approximation algorithms     Parallel Solution to Cat-and-Mouse strategy game problem (Vyukov)     Material Type: Coding…
Intel Developer Zone Beiträge abonnieren
Kein Inhalt gefunden
Intel Developer Zone Blogs abonnieren
A new algorithm of a scalable distributed sequential lock
Von aminer100
Scalable distributed sequential lock Scalable Distributed Sequential lock     Scalable Distributed Sequential lock version 1.11 Author: Amien Moulay Ramdane.  Description: This scalable distributed sequential lock was invented by Amine Moulay Ramdane, and it combines the characteristics of a distributed reader-writer lock with the characteristics of a sequential lock , so it is a clever hybrid reader-writer lock that is more powerful than the the Dmitry Vyukov's distributed reader-writer mutex , cause the Dmitry  Vyukov's distributed reader-writer lock will become slower and slower on the writer side with more and more cores because it transfers too many cache-lines between cores on the writer side, so my invention that is my scalable distributed sequential lock has eliminated this weakness of the Dmitry Vyukov's distributed reader-writer mutex,  so that the writers throughput has become faster and very fast, and my scalable distribu…
Let's talk computer science...
Von aminer100
Hello, Let's talk computer science... I thought yesterday about parallel hashtables an there scalability, and i have done a scalability prediction about my parallel hashlist and my parallel varfiler, since in a parallel hashtable we are using an "array" that permit also to reduce the access time to a time complexity of O(1) in best case scenarios, this array is also a bottleneck in scalability, cause on after you use a modulo that gives an index on the array , this index on the array will be expensive in term of running time , cause this will cause a cache miss and will cost around 400 CPU cycles on x86, and since i am using a binary tree on the buckets , so the height of the binary tree will be on average a binary logarithm of the number of elements on the binary tree, and since every element of the binary tree is allocated on a different NUMA node this will parallelize the memory transfers from the memory to the CPU when we are acessing the binary tree, but since the height of …
Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware
Von aminer100
Hello, My Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware is here, now you don't need to allocate your arrays in different NUMA-nodes, cause i have implemented all the NUMA functions for you, this new algorithm is NUMA-aware and cache-aware and it's really scalable on NUMA-architecture and on multicores, so if you have a NUMA architecture just run the "test.pas" example that i have included on the zipfile and you will notice that my new algorithm is really scalable on NUMA architecture. Frankly i think i have to write something like a PhD paper to explain more my new algorithm , but i will let it at the moment as it is... perhaps i will do it in the near future. This scalable Parallel library is especially designed for large scale industrial engineering problems that you find on industrial Finite element problems and such, this scalable Parallel library was ported to both FreePascal and all the Delphi XE vers…
Could Intel confirm if Haswell can write 16/32 byte atomically?
Von Fabio F.3
The manual says that memory writes up to 8 bytes are atomic if aligned. I ran some multi-threaded tests on a Haswell that seem to indicate that 16/32-byte writes are also atomic when using SSE/AVX intrinsics properly. So, assuming the memory locations are 16/32 byte aligned, and you are using a single SSE/AVX store instruction, in what cases would the write not be atomic? 
Multi-Threading
Von Mayur B.0
Hello everyone,        I want to solve sparse matrix (for solving linear equations) with minimum time. Now, I am using "pardiso" function from Intel MKL library(Version10.3). But this function takes too long time. Is there any other function available in latest Version which fulfills minimum time requirement?     Could you please help me. Thanks in advance. Mayur
My new invention: Scalable distributed sequential lock
Von aminer102
  Hello, Scalable Distributed Sequential lock version 1.01 Author: Amine Moulay Ramdane.  Email: aminer@videotron.ca Description: This scalable distributed sequential lock was invented by Amine Moulay Ramdane, and it combines the characteristics of a distributed reader-writer lock with the characteristics of a sequential lock , so it is a clever hybrid reader-writer lock that is more powerful than the the Dmitry Vyukov's distributed reader-writer mutex , cause the Dmitry  Vyukov's distributed reader-writer lock will become slower and slower on the writer side with more and more cores because it transfers too many cache-lines between cores on the writer side, so my invention that is my scalable distributed sequential lock has eliminated this weakness of the Dmitry Vyukov's distributed reader-writer mutex,  so that the writers throughput has become faster and very fast, and my scalable distributed sequential lock elminates the weaknesses of the Seqlock (sequential lock) that is "livelo…
interlocked or not interlocked?
Von Rudolf M.7
I'm using an InterlockedCompareExchange to set a variable to my id (something like "while(0 != InterlockedCompareExchange(&var, myId, 0)) ::Sleep(100);" ) now... no other thread will change this variable until it becomes 0 again... after using it, I could do an "InterlockedExchange(&var, 0);" or simply "var = 0;" ... I'm not sure, but I think, this doesn't change much... which one is the bether solution? which one the faster? ... or is one even wrong? ... I thought, the second one could be the faster one, when I don't expect to see a lot of threads trying to "take" this variable at the same time... is that correct?
OpenMP Block gives false results
Von Jack S.1
Hi all, I would appreciate your point of view where I might did wrong using OpenMP.  I parallelized this code pretty straight forward - yet even with single thread (i.e., call omp_set_num_threads(1)) I get wrong results. I have checked with Intel Inspector, and I do not have a race condition, yet the Inspector tool indicated (as a warning) that a thread might approach other thread stack (I have this warning in other code I have, and it runs well with OpenMP). I'm pretty sure this is not relate to the problem. Thanks, Jack. SUBROUTINE GR(NUMBER_D, RAD_D, RAD_CC, SPECT) use TERM,only: DENSITY, TEMPERATURE, VISCOSITY, WATER_DENSITY, & PRESSURE, D_HOR, D_VER, D_TEMP, QQQ, UMU use SATUR,only: FF, A1, A2, AAA, BBB, SAT use DELTA,only: DDM, DT use CONST,only: PI, G IMPLICIT NONE INTEGER,INTENT(IN) :: NUMBER_D DOUBLE PRECISION,INTENT(IN) :: RAD_CC(NUMBER_D), SPECT(NUMBER_D) DOUBLE PRECISION,INTENT(INOUT) :: RAD_D(NUMBER_D) DOUBLE PRECISION :: R3, DR…
Foren abonnieren

Highlights