|新鲜出炉！Intel® Xeon Phi™ Coprocessor High Performance Programming
|英特尔® System Studio
英特尔® System Studio 是一款综合性集成软件开发工具套件解决方案，能够缩短上市时间，增强系统可靠性，并提高能效和性能。 全新！
介绍面向英特尔® 至强™ 处理器和英特尔® 至强融核™ 协处理器的高性能应用程序开发。
|Structured Parallel Programming
作者 Michael McCool、Arch D. Robison 和 James Reinders 采用一种基于结构性形式的途径，从而使该课题能为每一位软件开发人员所接受。
I have been trying to convert a C++ MPI code into OpenMP. There are large number of static member variables (mostly dynamic lists of class objects), and i am trying to use 'thread_local' to make sure there are no conflicts. But the file does not compile and threw error: "error: expected a ";"". I was using ICC 14. When i tried to use ICC 15 beta version, the particular file where i used thread_local compiled, but the compilation of the whole application failed at some other point: "undefined reference to '__cxa_thread_atexit'". Would greatly appreciate help in solving this issue.
Hello I am running a visualization program (visualizing a large dataset) where I can either use MPI or pthreads. When I run it on my desktop which has an Intel i7-2600K (4 cores, 8 threads), I get better performance using pThreads (I'm using a lot of threads, e.g 32) compared to using MPI which is normal (I guess). But when I run the same code on one node (which is part of a cluster) which has Intels Xeon E5-2680 v2 (10 cores, 20 threads), the performance I get using pthreads is worse than MPI; about 70s while using MPI compared to 180s using pthreads. Even worse, the performance on the Intel Xeon E5-2680 v2 is lower than on that of the Intel i7-2600K, it's around 100s on the 2600k but 180 on the E5-2680 (same number of threads on both). I check using the top command and all the cores are active when I run the program. So my question is why is that happening? Is there some other way I should be compiling the code on the E5-2680? Is there some variables I should set like KMP_AFFIN...
Hi, I have a question about Hardware and Software Transactional Memory. Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location. The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts? Hope my question is clear. Thanks. Best Regards, Simone
I have a Fortran code that runs both MPI and OpenMP. I have done some profiling of the code on an 8 core windows laptop varying the number of mpi tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface. The problem I am having is when I port over to a Linux cluster with several 8-core nodes. Specifically, my openmp thread parallelism performance is very poor. Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue. I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue. What I have tried so far ... 1. setenv OMP_WAIT_POLICY active ## seems to make sense 2. setenv KMP_BLOCKTIME 1 ## this is counter to what I have read but when I set this to a large number (2500...
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.