Intel® Trace Analyzer and Collector 8.0 Release Notes
By Gergana Slavova (Intel)Posted 09/06/20120
This page provides the current Release Notes for the Intel® Trace Analyzer and Collector 8.0 products. All files are in PDF format - Adobe Reader* (or compatible) required. To get product updates, log in to the Intel® Software Development Products Registration Center. For questions or technical s...
Case Study: Computing Black-Scholes with Intel® Advanced Vector Extensions
By shuo-li (Intel)Posted 09/06/20120
1. Introduction In its relentless effort to lead innovation and deliver greater compute capacity and lower power consumption to satisfy the growing demands across the industry segments and the evolving usage models, Intel introduced a new set of instructions called the Intel® Advanced Vector Exte...
Intel® Trace Analyzer and Collector - Documentation
By Gergana Slavova (Intel)Posted 09/06/20120
The section below provide links to the Intel® Trace Analyzer and Collector 8.1 documentation.  You can find other documentation, including user guides and reference manuals for current and earlier Intel software product releases in the Intel® Software Documentation Library. Documentation Releas...
Intel® MPI Library 4.0 Release Notes
By Gergana Slavova (Intel)Posted 09/05/20120
This page provides the current Release Notes for the Intel® MPI Library 4.0 products. All files are in PDF format - Adobe Reader* (or compatible) required. To get product updates, log in to the Intel® Software Development Products Registration Center. For questions or technical support, visit Int...


MOVNTI and alignment for real mode
By Kostik B.2
In the SDM rev. 48, vol. 2A, page 3-546, in the description of the exceptions for the MOVNTI instruction in the real-mode, it is specified that the instruction can generate #GP If a memory operand is not aligned on a 16-byte boundary, regardless of segment. There is no exceptions specified for unaligned stores for protected or long mode, except for AC enabled.  AMD reference is also silent about the unaligned stores.  Is this indeed an irregularity in real mode, or just a typo in the spec ?
I have come to an interresting subject
By aminer100
Hello... I have come to an interresting subject, as you have noticed i have designed and implemented parallel programs that you can find in my following website: But i was thinking more and more about my parallel programs,and asking myself some questions... if you take a look carefully at my compression library or my parallel archiver you will notice that this compression libraries are construtions of easier high level objects that you can use to do your compression EASILY, it's like robotics and automatization , now you are not required to write compression algorithms or write those high level objects that easy for you the compression process, you are only required to call the methods of those high level objects that do the compression for you, so it's like robotic automatization, you are only required to instantiate high level objects that do the compression and call the methods and it is much easier, but since it's like robotic automatizat...
Are my Parallel Studio packages updating or not?
By dnesteruk2
I've fired up the Intel Software Manager, pressed the download buttons and it all looks like this: So instead of pause buttons I get resume buttons. I've tried pressing them, they briefly turn into pause buttons. So my question: is anything being downloaded or is this thing broken? Thanks. P.S.: registration on this forum is atrocious. Finding this forum was next to impossible. The media upload thing is so far below I didn't notice it and uploaded elsewhere. Usability hint-hint!
mem address directly from SSE/AVX register
By Luchezar B.3
Hello, I would like to make a suggestion Very often [otherwise well vectorizible] algorithms require reading/writing from/to mem addresses which are calculated per-channel (reading from table, sampling a texture, etc.).When you get to this, you are forced to make that part of the algorithm scalar by extracting each channel in turn to a GP register, performing the memory operation and then inserting the result back to a vector register.I don't think a single instruction that interprets each channel as an address and reads/writes to different memory locations at once is hardware feasible (though it would be extremely good) but at least we could have something that would ease the situation. my suggestion is instructions for memory access that get the address directly from the sse/avx register: loadd $(i + (j<<4)), %xmm0, %xmm1 - read 32-bit word from address specified in the i-th dword of xmm0 and store it in j-th quarter of xmm1stored $(i + (j<<4)), %xmm0, %xmm1 - read 32-...
Studying Intel TSX Performance: strange results
By Alexander K.9
Dear all, I've made studying of Intel TSX performance - its abort cases and comparison with spin lock. The study with reference to source code is available at . I see some performance gain for TSX in comparison with spin lock. However I stll have few of questions: 1. I see huge jump of transactional aborts when transaction work set reaches 256 cache lines (Figure 1). This is 16KB which is just only a quarter of L1d cache. The workload is single threaded, running strictly on one CPU core. Also I didn't use HyperThreading. I know that the cache has 8-way associativity, however I used static memory allocations (which are continous in virtual memory and likely continous physically) and it's unlikely that I have too many collisions in cache lines to get only 16KB of available cache memory for the transaction. So how the spike can be explained? Also Figure 3 (dependency of execution time on transaction size) shows s...
Capacity planning
By aminer103
Hello, I have come to an interresting subject, if we have a distributed database and a webserver and HTML files and you want to do a capacity planning of your webserver this will complicate the things, cause the database server must be modelized as an hyperexponential distribution that is an M/G/1 queuing system , but as you have noticed since the database server system , in our network , comes before the internet connection that will be modeled as an M/M/1 queuing system, so you have to use a queuing network simulation to solve this problem , but if you have noticed, in capacity planning we have also to calculate the response time of the worst case performance, so this will easy the job for us cause in the worst case scenario since the M/G/1 queuing system of the database server have three exponential distributions for the reads and writes and deletes transactions, so we have to choose the worst service time that is exponentially distributed , so i think we have to choose only the...
Debug mic in Windows7
By Victor Z.1
 When I typed   "micnativeloadex MyTest -d 0" in Windows 7 to debug  MyTest.exe ( mic + OpenMP), I got the error  message below: Unable to create remote process. ssh to the coprocessor and run ps to verify the coi_daemon is executing. It may be necessary to restart the mpss service.      But there is no problem to run mic+openmp application.  I follow Intel debugger extersion for Intel MIC -VC2012. But it did work. How to debug it? Thank you in advance.
Parallel archiver and scalability
By aminer101
Hello, I think i am happy now, please read again... I have benchmarked parallel archiver using parallel LZMA using  5 threads on a quad core, so this have giving false results on the timing... So i have started parallel archiver with a single thread and this has giving a more accurate results, here is my correction please read again... I have come to an interresting subject, so be smart and follow with me please... I have tried to do a worst scalability prediction with an HDD hardiskfor my parallel archiver(you will find my parallel archiver here:  with Parallel LZMA, and i think it's worst than what i have thought.. there is four things in my Parallel LZMA algorithm: First we have to copy serially a stream from the hardisk to the memory and this will take in average 0.2  second and in the compression method we have to copy a stream to the memory and this will take in average 0.05 second and in the compression method you have to compress a stream ...


