Intel ISA Extensions

Cannnot change IA32_PERF_CTL value: it gets overwritten by the operating system

Hi,

I'm running an experiment on a server machine with a quad-core Xeon X5355 processor running a linux system.

I try to control core voltage and frequency separately by writing to the msr IA32_PERF_CTL (0x199). I change the value of IA32_PERF_CTL using a "wtmsr" command and verify that its value has been changed using a "rdmsr" command. However, when I run "rdmsr 0x199" again a few seconds later, I find that the value of IA32_PERF_CTL is overwritten with its previous value. The value of IA32_PERF_STATUS does not represent my change either.

Suggestion about memory-access-signaling mechanism

Hello,

while I was trying to solve some particular multi-thread problem, it occurred to me that it could be solved more efficiently with special assistance from the CPU.

The situation is as follows: say one thread needs to block until the content of a particular 4-byte (or can be other size) location in the memory is changed. (It think the usefulness of this is very obvious and there is no need to give concrete examples to demonstrate it).

What are the currently available options:

TSX results - please explain

I am using Roman Dementiev's code as a base and modifying it to determine if TSX operations are behaving according to expectations.

https://software.intel.com/en-us/blogs/2012/11/06/exploring-intel-transactional-synchronization-extensions-with-intel-software

I am counting the number of times that xbegin() returns successful, the number of times it aborts and the number of times that fallback lock is used.

Processing of data in SSE/AVX/AVX2

Hello!

Im working on my project and Im looking for the answer:

When Im processing 256-bits of data, is better to use (in one core) for this one whole YMMx register or to split them for 2x128-bits and process them through 2 XMMx registers at different ports, hence on different SSE/AVX unit (in Sandy Bridge there are 3 ports per core for AVX)?  Which option is faster?

Processing of data in SSE/AVX/AVX2

Hello!

Im working on my project and Im looking for the answer:

When Im processing 256-bits of data, is better to use (in one core) for this one whole YMMx register or to split them for 2x128-bits and process them through 2 XMMx registers at different ports, hence on different SSE/AVX unit (in Sandy Bridge there are 3 ports per core for AVX)?  Which option is faster?

Intel SDE control

Hello,

I'm using Intel SDE in Linux, version 7.8.0 and I wish to count some of the floating point instructions executed by the application. I'm using -mix tool to get the number of instructions.

Is there any way to control Intel SDE, some events to start and stop the counting? I would like to instrument just a part of the application, but I don't have an access to applications' source code.

Thanks and regards,

Milan

Strange IPC behavior

Following discussion https://communities.intel.com/message/257079 I am creating this thread to get some help in explaining a strange behavior in the time taken by some instructions on Intel CPU.
In short, I am measuring the IPC of a program in two cases:
Case 1: when I skip 29 instructions in the control flow of the program,
Case 2: when I execute them.

Self-compiled GCC available for download doesn't recognize -fmpx flag

Hi...

I compiled the sources of GCC 4.7.2 available for download here at this Intel page. I can perfectly run the compiler, however it doesn't seem to understand the -fmpx flag: it throws me an "unrecognized command line option" error.

S’abonner à Intel ISA Extensions