Intel® Developer Zone:
Desempenho

Destaques

Recentemente publicado! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Aprenda os conceitos básicos da programação para essa nova arquitetura e novos produtos. Novo!
Intel® System Studio
O Intel® System Studio e uma abrangente suíte de ferramentas integradas de desenvolvimento de software que pode diminuir o tempo de lançamento do produto no mercado, aumentar a confiabilidade do sistema e melhorar a eficiência energética e o desempenho. Novo!
No caso de você ter perdido - Repetição do webinar de dois dias
Introdução ao desenvolvimento de aplicativos de alto desempenho para os coprocessadores Intel® Xeon & Intel® Xeon Phi™.
Structured Parallel Programming
Os autores Michael McCool, Arch D. Robison e James Reinders usam uma abordagem baseada em padrões estruturados que podem tornar o assunto acessível a qualquer desenvolvedor de software.

Forneça aos clientes o melhor desempenho de seus aplicativos com a programação paralela e a ajuda dos inovadores recursos da Intel.

Recursos de desenvolvimento


Ferramentas de desenvolvimento

 

Intel® Parallel Studio

Trazendo um paralelismo simplificado e completo para os desenvolvedores do Microsoft Visual Studio* C/C++, o Intel® Parallel Studio tem ferramentas avançadas para otimizar aplicativos cliente para multi-core e muitos cores (núcleos).

Produtos Intel® para desenvolvimento de software

Explore todas as ferramentas para ajudar você a otimizar na arquitetura Intel. Ferramentas selecionadas estão disponíveis por um período de avaliação gratuita de 45 dias.

Base de conhecimento das ferramentas

Guias e informações de suporte para as ferramentas Intel.

Message Passing Interface
Por Publicado em 05/19/20080
A standard message-passing interface (MPI) adopted by most MPP vendors, as well as by the cluster-computing community. The existence of a widely-supported standard enhances program portability; an MPI-based program developed for one platform should also run on any other platform for which an im...
Lock
Por Publicado em 05/19/20080
A mechanism to organize the interaction of multiple units of execution (UEs), usually through the control of a shared resource. When one or several UEs own a lock, it or they may regulate access to resources associated with the lock. Locks can be implemented in hardware (see atomic) or software...
Load balancing
Por Publicado em 05/19/20080
The process of distributing work to units of execution (UEs), such that each UE involved in a parallel computation takes approximately the same amount of time. There are two major forms of load balancing. In static balancing the distribution of work is determined before the computation starts. ...
Load balance
Por Publicado em 05/19/20080
In a parallel computation, tasks are assigned to units of execution (UEs), which are then mapped onto processing elements (PEs) for execution. The net work carried out by the collection of PEs is the “load” associated with the computation. ”Load balance” refers to how that load is distributed a...
Assine o Artigos do Espaço do desenvolvedor Intel
Nenhum conteúdo foi encontrado
Assine o Blogs do Intel® Developer Zone
Locking CPU cache lines for a thread ( L1)
Por Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
Por Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
Por Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
Por Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
Optimizing reduce_by_key implementation using TBB
Por Shruti R.0
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.
reading a shared variable
Por VIKRANT G.4
hello everyone I am relatively new to parallel programming and have the following doubt:- is reading a shared variable(that is not modified by any thread) without using locks a good practice thanks for the help in advance  
Weird Openmp bug
Por Cheng C.1
Dear all, I want to combine OpenMP and RSA_public_encrypt and RSA_private_decrypt routines. However, I was confused by a weird bug for a few days.    In the attached program, if I generated 2 threads for parallel encryption and decryption, everything works well. If I generated 3 or more threads, the RSA_public_encrypt routine works fine. All strings are successfully encrypted (encrypt_len=256). However, the RSA_private_decrypt routine went wrong, that is, only one thread works properly, all the other threads failed to decrypt some of the strings (decrypt_len=-1, rsa_eay_private_decrypt padding check failed). If there are 1000 strings and 4 threads, the total number of string failed to decrypt went around 710 (some times as low as around 200). So as expected, if I use 4 threads for parallel RSA_public_encrypt and one thread for RSA_private_decrypt, nothing went wrong.   It would be great if you could give some ideas. Thanks very much.    #include <openssl/rsa.h> #include <...
performance loss
Por Bo W.8
Hi, some interesting performance loss happened with my measurements. I have a system with two sockets, each socket is a E5-2680 processor. Each processor has 8 cores and with hyper-threading. The hyper-threading was ignored.  On this system, I started a program 16 times at the same time and each time pinned the program to different cores. At first, i set all cores to 2.7GHz and saw : Program 0 Runtime 7.7s Program 8 Runtime 7.63s And then, i set  cores on the second socket  to 1.2GHz and saw: Program 0 Runtime 12.18s Program 8 Runtime 15.73s The program 8 ran slower. It is clear, because core 8 had lower frequency. But why was program 0 also slower? Its frequency wasn't touched.   Regards, Bo
Assine o Fóruns

Destaques