Zone des développeurs Intel® :
Performance

Points forts

Juste publié ! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Apprenez les fondements de la programmation pour cette nouvelle architecture et les nouveaux produits. Nouveau !
Intel® System Studio
Intel® System Studio est une suite exhaustive d’outils intégrés de développement de logiciels qui peut accélérer la mise sur le marché, renforcer la fiabilité des systèmes et améliorer l’efficacité énergétique et les performances. Nouveau !
Au cas où vous l’avez manqué – Rediffusion du webinaire en direct de deux jours
Introduction au développement d’applications hautes performances pour processeurs Intel® Xeon® et coprocesseurs Intel® Xeon Phi™.
Structured Parallel Programming
Les auteurs Michael McCool, Arch D. Robison et James Reinders utilisent une approche basée sur des modèles structurés qui devrait rendre le sujet accessible à tous les développeurs de logiciels.

Optimisez les performances de vos applications grâce à la programmation parallèle et avec l'aide des ressources novatrices d'Intel.

Ressources de développement


Outils de développement

 

Intel® Parallel Studio

Intel® Parallel Studio, qui apporte aux développeurs Microsoft Visual Studio* C/C++ un traitement parallèle de bout en bout simplifié, fournit des outils avancés permettant d’optimiser les applications clientes pour un traitement multicœur et à nombreux cœurs.

Produits Intel® de développement logiciel ›

Explorez tous les outils qui vous aideront à optimiser vos applications pour l’architecture Intel. Certains outils sont disponibles pour une période d’évaluation gratuite de 45 jours.

Base de connaissances sur les outils

Trouvez des guides et des informations d'assistance sur les outils Intel.

Lock
Par Publié le 05/19/20080
A mechanism to organize the interaction of multiple units of execution (UEs), usually through the control of a shared resource. When one or several UEs own a lock, it or they may regulate access to resources associated with the lock. Locks can be implemented in hardware (see atomic) or software...
Load balancing
Par Publié le 05/19/20080
The process of distributing work to units of execution (UEs), such that each UE involved in a parallel computation takes approximately the same amount of time. There are two major forms of load balancing. In static balancing the distribution of work is determined before the computation starts. ...
Load balance
Par Publié le 05/19/20080
In a parallel computation, tasks are assigned to units of execution (UEs), which are then mapped onto processing elements (PEs) for execution. The net work carried out by the collection of PEs is the “load” associated with the computation. ”Load balance” refers to how that load is distributed a...
Linda
Par Publié le 05/19/20080
A coordination language for parallel programming. See tuple space.
S’abonner à Articles de la Zone des développeurs Intel
Aucun contenu trouvé
S’abonner à Blogs de la Zone des développeurs Intel®
Poor threading performance on Intel Xeon E5-2680 v2
Par Pascal10
Hello I am running a visualization program (visualizing a large dataset) where I can either use MPI or pthreads. When I run it on my desktop which has an Intel i7-2600K (4 cores, 8 threads), I get better performance using pThreads (I'm using a lot of threads, e.g 32) compared to using MPI which is normal (I guess). But when I run the same code on one node (which is part of a cluster) which has Intels Xeon E5-2680 v2 (10 cores, 20 threads), the performance I get using pthreads is worse than MPI; about 70s while using MPI compared to 180s using pthreads. Even worse, the performance on the Intel Xeon E5-2680 v2 is lower than on that of the Intel i7-2600K, it's around 100s on the 2600k but 180 on the  E5-2680 (same number of threads on both). I check using the top command and all the cores are active when I run the program.   So my question is why is that happening? Is there some other way I should be compiling the code on the E5-2680? Is there some variables I should set like KMP_AFFIN...
HTM/STM and Scheduling
Par Simone A.1
Hi, I have a question about Hardware and Software Transactional Memory. Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location. The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts? Hope my question is clear. Thanks. Best Regards, Simone
Locking CPU cache lines for a thread ( L1)
Par Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
Par Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
Par Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
Par Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
Optimizing reduce_by_key implementation using TBB
Par Shruti R.0
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.
reading a shared variable
Par VIKRANT G.4
hello everyone I am relatively new to parallel programming and have the following doubt:- is reading a shared variable(that is not modified by any thread) without using locks a good practice thanks for the help in advance  
S’abonner à Forums

Points forts