Author's Blogs

How to printf inside (aborted) Intel® Transactional Synchronization Extensions (Intel TSX) transactions
By Roman Dementiev (Intel) Posted on 07/29/16 1
One of the most popular ad-hoc functional debugging techniques is to use the printf or fprintf functions to display the state of variables. However, if these functions are used inside an Intel® TSX transaction they can cause transaction aborts. The reason is that flushing the print output buffer involves an operating system call and an I/O operation: operations that cannot be roll backed by Intel® TSX. That means that the (f)printf output from transaction may be lost due to the machine state roll-back as a result of the transaction abort caused by the attempt to flush the I/O buffer inside the transaction. If the flush happens after a committed transaction then the printf output won’t be lost. In general, any transaction abort handler needs to use a fall-back synchronization mechanism that does not involve Intel TSX. It should, therefore, be possible to see the problem that is being debugged there where printf works as expected. However, what can you do if, for some reason, the problem is not reproducible in the fall-back execution? So far I haven’t had this problem, but if you do please consider the trick shown below.
Web Resources about Intel® Transactional Synchronization Extensions
By Roman Dementiev (Intel) Posted on 07/28/14 3
Short URL for this page: In this blog I list useful technical resources related to Intel® Transactional Synchronization Extensions (Intel TSX). I will try to keep the list up-to-date as new material becomes available (subscribe to this page below to get update notifica...
Developer API Documentation for Intel® Performance Counter Monitor
By Roman Dementiev (Intel) Posted on 07/24/14 0
  The Intel® Performance Counter Monitor (Intel® PCM: is an open-source tool set based on an API. This API can be used directly by developers in their software. Besides the API usage example in the article, other samples of code using the API can be found in pcm.cpp, ...
Documentation for uncore performance monitoring units
By Roman Dementiev (Intel) Posted on 07/11/14 0
Hello everyone, The uncore performance monitoring units (uncore PMUs) provide many useful information like memory controller traffic, traffic between sockets/processor packages, energy related metrics in the uncore (sleep states for Intel® Quick Path Interconnect links or DRAM sleep states for e...
Monitoring Intel® Transactional Synchronization Extensions with Intel® PCM
By Roman Dementiev (Intel) Posted on 06/14/13 2
After applying a new technology (a new processor, a hardware accelerator, a new instruction, etc) besides measuring the immediate performance delta one requires a method to verify that this technology has been applied correctly and efficiently. Intel® Transactional Synchronization Extensions (Int...
Exploring Intel® Transactional Synchronization Extensions with Intel® Software Development Emulator
By Roman Dementiev (Intel) Posted on 11/06/12 1
Intel® Transactional Synchronization Extensions (Intel® TSX) is perhaps one of the most non-trivial extensions of instruction set architecture introduced in the 4th generation Intel® Core™ microarchitecture code name Haswell. Intel® TSX implements hardware support for a best-effort “transactional...
Intel Performance Counter Monitor V2.3 released (supporting MacOS and FreeBSD)
By Roman Dementiev (Intel) Posted on 11/06/12 0
We are proud to announce that Intel Performance Counter Monitor V2.3 (Intel PCM) has been released with the following changes: Support of Apple Mac OS X 10.7 ("Lion") and OS X 10.8 ("Mountain Lion") Support of FreeBSD new tool for monitoring memory traffic per channel on Intel Xeon processor E5 p...
Dissecting STREAM benchmark with Intel® Performance Counter Monitor
By Roman Dementiev (Intel) Posted on 11/23/10 8
Intel® Performance Counter Monitor (Intel® PCM) is an API and a set of tools that should help developers to understand how their applications utilize the underlying compute platform. In this blog I will explain how to instrument the well-known STREAM benchmark with library functions of Intel® PCM...
Is your memory management multi-core ready?
By Roman Dementiev (Intel) Posted on 08/21/09 13
Recently I have got a workload that could not scale beyond a few cores. This particular application is using one thread per user, so theoretically, if one has an 8-core machine then 8 concurrent users should fully utilize the machine giving 8x speedup compared to a sequential run. It did not happ...