23-Maggio-2012
2:40 PM PDT
Performance Tools for Software Developers - SSE generation and processor-specific optimizations continued
By adminPosted 05/23/20121
Can I combine the processor values and target more than one processor? How to generate optimized code for both Intel and AMD* architecture? Where can I find more information on processor-specific optimizations?
18-Maggio-2012
9:48 AM PDT
Location of the End User License Agreement (EULA)
By Corey Alsamariae (Intel)Posted 05/18/20126
Where to find the EULA (end user license agreement) for the intel software development tools.
14-Maggio-2012
11:55 AM PDT
Future-Proof Your Application's Performance With Vectorization Technical Presentation Questions and Answers
By Elizabeth S (Intel)Posted 05/14/20120
FAQ Here are the questions and answers from the Future-Proof Your Applications's Performance With Vectorization technical presentation held February 15, 2012. Question and Answers:Q: Can you recommend to me books about theory of vectorization?A: Yes, please take a look at The Software Vect...
08-Maggio-2012
11:25 AM PDT
Superscalar programming 101 (Matrix Multiply) Part 5 of 5
By jimdempseyatthecovePosted 05/08/20120
In part 4 we saw the effects of the QuickThread Parallel Tag Team Transpose method of Matrix Multiplication performed on a Dual Xeon 5570 systems with 2 sockets and two L3 caches, each shared by four cores (8 threads). and each processor with four L2 and four L1 caches each shared by one core and...
08-Maggio-2012
11:24 AM PDT
Superscalar programming 101 (Matrix Multiply) Part 4 of 5
By jimdempseyatthecovePosted 05/08/20120
In the last installment (Part 3) we saw the effects of the QuickThread Parallel Tag Team method of Matrix Multiplication performed on two single processor systems: Where the Intel Q6600 (4 core – no HT) with two cores (two threads) sharing L1 and L2 caches attained a 40x to 50x improvement over ...
08-Maggio-2012
11:23 AM PDT
Superscalar programming 101 (Matrix Multiply) Part 3 of 5
By jimdempseyatthecovePosted 05/08/20122
By Jim Dempsey In the previous article (part 2) we have seen that by reorganizing the loops and with use of temporary array we can observe a performance gain with SSE small vector optimizations (compiler does this) but a larger gain came from better cache utilization due to the layout change and ...
08-Maggio-2012
11:22 AM PDT
Superscalar programming 101 (Matrix Multiply) Part 2 of 5
By jimdempseyatthecovePosted 05/08/20123
By Jim DempseyIn my last article we left off with The above charts, impressive as they are, are an "apples and oranges" type of comparison. The chart is comparing a non-cache sensitive serial technique against a cache sensitive parallel technique. Good for promotional literature, certainly a goo...
08-Maggio-2012
11:21 AM PDT
Superscalar Programming 101 (Matrix Multiply) Part 1 of 5
By jimdempseyatthecovePosted 05/08/201216
By Jim DempseyThe subject matter of this article is: How to optimally tune a well known algorithm. We will take this well known (small) algorithm, a common approach to parallelizing this algorithm, a better approach to parallelizing this algorithm, and then produce a fully cache sensitized approa...
08-Maggio-2012
11:19 AM PDT
Webinar: Getting Reproducible Results with Intel® MKL 11.0 beta
By TODD R. (Intel)Posted 05/08/20120
A technical talk on the Condition Numerical Reproducibility (CNR) feature in Intel® MKL 11.0
08-Maggio-2012
11:16 AM PDT
Distributed memory coarray programs with process pinning
By Patrick Kennedy (Intel)Posted 05/08/20120
This article describes a method to compile and run a distributed memory coarray program using Intel® Fortran Compiler XE 12.0. An example using Linux* is presented.

Pagine

Iscriversi a