I am working with MIC and i am searching for a faster sqrt function, i just read an article that there also exists the possible to use reverse_sqrt with lower latency but i never found one. There are a lot function which are maybe restricted only to assemble code , is there anyway to call reverse sqrt into my c code , especially into my simd enviroment ? like this to get better performance ? (http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-vect...)
Is there any formula to convert the Internal Energy unit in PCM to Joules? Thanks.
There is a new Linux tool available for measuring the DRAM latency on your system:
If you start the binary without arguments on a NUMA system, it will print a matrix of latencies for accessing memory between socket. I hope you find it useful.
Please explain one uncertain
That collect MSR/PCM counters ?
As I understand it, I can collect counters through the perf or through driver on Linux allows to read and write in the MSR registers.
Collected counters show the number of events for each thread (counters binding with thread) or the total number of events occurring in the device ( without binding to the thread, for example the total number of load or store event of all threads of one processor/core without bindig to thread) ?
I'm trying to get stats for an OpenMP program using `pcm.x "path/to/executable"` after setting OMP_NUM_THEADS environment variable to `1`. PCM doesn't seem to respect that. It runs the program on all the available cores. Tried `pcm.x "export OMP_NUM_THREADS=1 && path/to/executable"` and also `pcm.x "OMP_NUM_THEADS=1 path/to/executable"`. Both doesn't work.
Any help on how to get this done is much appreciated.
In reading the memory ordering section of Intel's Combined Software Developer's manual located here:
Volume 3, Chapter 8, Section 220.127.116.11 (Page 2,115 in that PDF) states:
May I know which system register (I am using a core 2 duo E8400) should be used to control the possible voltage/freq settings for DVFS. I am not trying to over clock the cpu. I am trying to select one of the pairs of available voltage/freq combinations. ACPI drivers do it save power during idle periods. I went through manual vol 2 and 3 and was able to read performance registers using rdmsr/wrmsr commands. But, I was not able to find the address of the control register that can control the DVFS settings.
May be a stupid question, but
Please tell me. With increasing cores frequency in turboboost mode the Uncore (or may be memory controller or simple memory bandwidth) increases too ?
In other words, the balance between performance CPU and memory subsystem bandwidth worsening or scaling with core frequency ?
if scaled then due to what?Thanks for your time
I am trying to read some events from the performance monitoring counters of processor Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz,
but I have not been able to find out the performance monitoring guidelines for Ivy Bridge processors in the Intel documentation.
Can anybody guide me in this regard?