Intel® Developer Zone:
Prestazioni

Attività di rilievo

Appena pubblicato! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Nozioni di base sulla programmazione per questa nuova architettura e nuovi prodotti. Novità!
Intel® System Studio
Intel® System Studio è una soluzione completa di suite integrate di strumenti per lo sviluppo del software che può accelerare i tempi di commercializzazione, rafforzare l'affidabilità del sistema e migliorare l'efficienza energetica e le prestazioni. Novità!
Nel caso vi sia sfuggito: 2 giorni di riproduzione del webinar dal vivo
Introduzione allo sviluppo di applicazioni a prestazioni elevate per i coprocessori Intel® Xeon® e Xeon Phi™
Structured Parallel Programming
Gli autori Michael McCool, Arch D. Robison e James Reinders usano un approccio basato su modelli strutturati che dovrebbero rendere l'argomento accessibile a ogni sviluppatore di software.

Offrite ai clienti applicazioni dalle prestazioni massime grazie all'uso della programmazione parallela con le risorse innovative di Intel.

Risorse di sviluppo


Tool per lo sviluppo

 

Intel® Parallel Studio

Intel® Parallel Studio offre agli sviluppatori Microsoft Visual Studio* C/C++ dei tool avanzati per ottimizzare le applicazioni client per i sistemi multi-core e many-core.

Prodotti Intel® per lo sviluppo di software ›

Esplorate tutti i tool che vi aiutano a ottimizzare per l'architettura Intel. Alcuni tool sono disponibili per un periodo di valutazione gratuita di 45 giorni.

Knowledge base dei tool

Vi si trovano guide e informazioni di supporto sui tool Intel.

Intel® Xeon® Processor E7 v2 Family
Di BELINDA L. (Intel)Pubblicato il 02/18/20140
  Based on Intel® Core™ microarchitecture (formerly codenamed Ivy Bridge) and manufactured on 22-nanometer process technology, the Intel® Xeon® Processor E7 V2 Family processors provide significant performance, memory and cache bandwidth, and memory capacity over the previous-generation Intel®...
Intel® Xeon® Processor E7 V2 Family Technical Overview
Di Sreelekshmy Syamalakumari (Intel)Pubblicato il 02/18/20140
Download PDF Contents 1. Executive Summary 2. Introduction 3. Intel® Xeon® processor E7 V2 family enhancements   3.1 Intel® C104/102 Scalable Memory Buffer   3.2 Intel® Secure Key (DRNG)   3.3 Intel® OS Guard (SMEP)   3.4 Intel® Advanced Vector Extensions (Intel® AVX)   3.5 Advanced Pro...
Trusted Tools in the New Android* World: Optimization Techniques - from Intel® SSE Intrinsics to Intel® Cilk™ Plus
Di Zvi Danovich (Intel)Pubblicato il 01/27/20140
Author: Zvi Danovich, Senior SW Application Engineer, Intel Introduction Most Android applications, even those based only on scripting and managed languages (Java*, HTML5,…), eventually use middleware features that would benefit from optimization. This paper will discuss optimization needs and...
Parallel Computation of Sparse Rulers
Di Arch D. Robison (Intel)Pubblicato il 01/14/20140
This article explains the sparse ruler problem, two parallel codes for computing sparse rulers, and some new results that reveal a surprising "gap" behavior for solutions to the sparse ruler problem. The code and results are included in the attached zip file. Background A complete sparse ruler ...
Iscriversi a Articoli Intel Developer Zone
Nessun contenuto trovato
Iscriversi a Blog Intel® Developer Zone
Locking CPU cache lines for a thread ( L1)
Di Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
Di Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
Di Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
Di Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
Optimizing reduce_by_key implementation using TBB
Di Shruti R.0
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.
reading a shared variable
Di VIKRANT G.4
hello everyone I am relatively new to parallel programming and have the following doubt:- is reading a shared variable(that is not modified by any thread) without using locks a good practice thanks for the help in advance  
Weird Openmp bug
Di Cheng C.1
Dear all, I want to combine OpenMP and RSA_public_encrypt and RSA_private_decrypt routines. However, I was confused by a weird bug for a few days.    In the attached program, if I generated 2 threads for parallel encryption and decryption, everything works well. If I generated 3 or more threads, the RSA_public_encrypt routine works fine. All strings are successfully encrypted (encrypt_len=256). However, the RSA_private_decrypt routine went wrong, that is, only one thread works properly, all the other threads failed to decrypt some of the strings (decrypt_len=-1, rsa_eay_private_decrypt padding check failed). If there are 1000 strings and 4 threads, the total number of string failed to decrypt went around 710 (some times as low as around 200). So as expected, if I use 4 threads for parallel RSA_public_encrypt and one thread for RSA_private_decrypt, nothing went wrong.   It would be great if you could give some ideas. Thanks very much.    #include <openssl/rsa.h> #include <...
performance loss
Di Bo W.8
Hi, some interesting performance loss happened with my measurements. I have a system with two sockets, each socket is a E5-2680 processor. Each processor has 8 cores and with hyper-threading. The hyper-threading was ignored.  On this system, I started a program 16 times at the same time and each time pinned the program to different cores. At first, i set all cores to 2.7GHz and saw : Program 0 Runtime 7.7s Program 8 Runtime 7.63s And then, i set  cores on the second socket  to 1.2GHz and saw: Program 0 Runtime 12.18s Program 8 Runtime 15.73s The program 8 ran slower. It is clear, because core 8 had lower frequency. But why was program 0 also slower? Its frequency wasn't touched.   Regards, Bo
Iscriversi a Forum

Attività di rilievo