Software Tuning, Performance Optimization & Platform Monitoring

Do intel cpus have options to speed up ipc ?


Maybe something silly I ask, but still :

Do intel cpus have some options to make interprocess communication faster.
I currently use mmaped regions buts maybe something even better could be done at the cpu level.

My feeling is that it cannot be used in any way because the kernel organizes things. But it would be amazing still.
I am on Linux, with kernel 3.19.

Any pointer how to make things go faster is of my interest.

Inline assembly to generate most heat on SB-E

I'm curious as to what __asm instructions would generate the most heat on a SB-E for stability testing, with prime95 I can get the CPU package power to just over 130w but experimenting with my own AVX assembly I cant get more than 100w out of it?

Intel PCM, measuring RAM activity ? (data not energy)

Hi everyone,

I have a simple question about Intel PCM:

Is it possible to measure RAM data activity in the code ?

Same question with any of the ready-made tools provided in Intel PCM package ?

All I have seen so far, is RAM energy measurement.

Thanks in advance for your help :)


Error building IntelPerformanceCounterMonitorV2.8 in Visual Studio 2013


I encounter the following error when I compile IntelPerformanceCounterMonitorV2.8 using Visual Studio 2013.

1>------ Build started: Project: PCM-Service, Configuration: Release Win32 ------
1>  utils.cpp
1>c:\intelperformancecountermonitorv2.8\utils.h(60): error C3861: 'YieldProcessor': identifier not found
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Has anyone encountered this issue? 



CPU/GPU ring programming


I am new on this forum. I start to design a C langage software based on CPU+GPU programming. The idea is to offload the processing on GPU when the tasks become heavy for some part of the software. I read the Gen8 paper talking about CPU+GPU shared ring memory. I look for information on how to program an application to use theses features. Could you recommend me some API, debugging tools, cache layers (L1, L2, L3) tools to visualize, investigate how the software behave on theses layers.

I currently use a MacBook Air


RAPL DRAM power limit


I am currently working on using RAPL to set power caps on both CPUs and DRAMs. I have a dual socket machine (motherboard model: supermicro X9DRL-3F/iF) ,and with Intel(R) Xeon(R) CPU E5-2690 v2 CPUs.

I can successfully monitor power and set power cap on any component except DRAM. More specifically, I can monitor the DRAM power but cant write anything into register "MSR_DRAM_POWER_LIMIT".  Also I tried to set unlock bit, but it didn't work either. (corresponding bits show that this register is locked and disabled)

Xeon E5/E7 memory power management documentation

I am looking for the CPU datasheet or other form of documentation that would describe the DRAM power management configuration in E5/E7 class CPUs. I was able to find the document named "Intel® Xeon® Processor E5 v2 Product Family Datasheet- Volume Two: Registers", which is very useful, but seemingly incomplete. In particular, in Section 7.1.2, the description of MEM_ACCUMULATED_BW_CH_ register references the PM_CMD_PWR register which does not appear anywhere in the datasheet.

Подписаться на Software Tuning, Performance Optimization & Platform Monitoring