Hello. When I run this code :
I am using an Atom N2600 processor. The intel software developer's manual says that a p-state can be requested by writing to MSR 0x199 and the locked p-state can be seen in MSR 0x198. The way to compute Core Voltage is given as MSR_PERF_STATUS[47:32] * (float) 1/(2^13).
The data that I see in MSR_PERF_STATUS (MSR 0x198) is 62d104306001045. Bits [47:32] is always 1043 irrespective of the value that I set in MSR 0x199.
When I use the formula: 0x1043 = 4163. Voltage = 4163/(2^13)=0.5 V, which is a really low voltage for the processor to operate stably at.
Does anyone know why the pmulhrsw instruction or
_mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768)
always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average.
Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased?
Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean):
Can we expect AVX512f on non-MIC systems this year, or only on Knights Landing during 2015?
I'm looking into programming with Intel Software Guard Extensions (SGX) facility recently. The idea of SGX is to create an enclave in which security-sensitive code is loaded and executed. Most importantly memory access (and many other restrictions) to that enclave is enforced by hardware.
In this blog I’ll try to show how to convert SSE4.2 assembly to AVX2 (using the schemes from the blog Programming using AVX2) and how this affects performance.
- Easy case. When it is enough to add “v” prefix and replace “xmm” with “ymm”.
Consider we have the following loop:
AVX2 appears to only offer _mm256_cmpeq_epi32 and _mm256_cmpgt_epi32. What's the most efficient way to implement _mm256_cmplt_epi32 given the available AVX2 functions?
Hello, I am running Intel SDE in 'ast' mode (AVX/SSE Transition tracker.) on Mac OSX. I struggle to interpret the results.
First off, in the output, it shows function addresses, not function names. Should it not show the symbols? I built my app with -g.
Next, this is the output I see: are these numbers indicative of excessive transitions? Or are they in a normal range?
The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing. Is this the correct behavior? Here is a reproducer code below showing the behavior for _mm256_blend_epi16 and _mm256_blend_epi32 where I attempt to insert a value into the first position of a vector using the blend instruction.