Intel® Streaming SIMD Extensions

Benefits of SSE/AVX processing when an integrated GPU is missing?

Some Intel processors have an on-chip GPU (e.g. Intel Core i/-4770K using a HD Graphics 4600 GPU) whilst others don't have this (e.g  Intel Core i7 3930K). I'm wondering what implications this will have on SSE/AXV SIMD processing when such an integrated GPU is missing on the CPU. Even though there is support for SSE/AVX on many processor not having the embedded GPU, I wonder if this will reduce the benefit of using SSE/AVX significantly compared to CPUs with an embedded GPU? 

Does VPMASKMOV require an aligned address?

The online for VPMASKMOV says that "mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated."  But the documentation in the Intel Instruction Set Reference Guide does not mention an alignment requirement, and seems to imply that it is not required: "Faults occur only due to mask-bit required memory accesses that caused the faults.".  

Huge time cost while assigning

Hello Guys:)

It is very nice to have this forum. I'm a fresh on the ISA Extension and expect to have your insight:)

My code snippet, which conducts a convolution computing, is attached as a figure. and here is my confusing issue:

Time was consumed hugely when I tried to assign the computed result to image buffer. Computing time of extension sets(line 512~544) only takes about 7~8ms, but the assign work(line 548) takes about 25~26ms.

Cannnot change IA32_PERF_CTL value: it gets overwritten by the operating system


I'm running an experiment on a server machine with a quad-core Xeon X5355 processor running a linux system.

I try to control core voltage and frequency separately by writing to the msr IA32_PERF_CTL (0x199). I change the value of IA32_PERF_CTL using a "wtmsr" command and verify that its value has been changed using a "rdmsr" command. However, when I run "rdmsr 0x199" again a few seconds later, I find that the value of IA32_PERF_CTL is overwritten with its previous value. The value of IA32_PERF_STATUS does not represent my change either.

Suggestion about memory-access-signaling mechanism


while I was trying to solve some particular multi-thread problem, it occurred to me that it could be solved more efficiently with special assistance from the CPU.

The situation is as follows: say one thread needs to block until the content of a particular 4-byte (or can be other size) location in the memory is changed. (It think the usefulness of this is very obvious and there is no need to give concrete examples to demonstrate it).

What are the currently available options:

TSX results - please explain

I am using Roman Dementiev's code as a base and modifying it to determine if TSX operations are behaving according to expectations.

I am counting the number of times that xbegin() returns successful, the number of times it aborts and the number of times that fallback lock is used.

Suscribirse a Intel® Streaming SIMD Extensions