Some Intel processors have an on-chip GPU (e.g. Intel Core i/-4770K using a HD Graphics 4600 GPU) whilst others don't have this (e.g Intel Core i7 3930K). I'm wondering what implications this will have on SSE/AXV SIMD processing when such an integrated GPU is missing on the CPU. Even though there is support for SSE/AVX on many processor not having the embedded GPU, I wonder if this will reduce the benefit of using SSE/AVX significantly compared to CPUs with an embedded GPU?
Has anyone successfully compiled an MPX instrumented glibc? What version of glibc, gcc and binutils did you use?
I'm having a terrible time trying to get this to work. I get errors of this form: http://pastebin.com/kRRDN43Q
I have tried at least the following versions:
The online https://software.intel.com/sites/landingpage/IntrinsicsGuide/ for VPMASKMOV says that "mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated." But the documentation in the Intel Instruction Set Reference Guide does not mention an alignment requirement, and seems to imply that it is not required: "Faults occur only due to mask-bit required memory accesses that caused the faults.".
It is very nice to have this forum. I'm a fresh on the ISA Extension and expect to have your insight:)
My code snippet, which conducts a convolution computing, is attached as a figure. and here is my confusing issue:
Time was consumed hugely when I tried to assign the computed result to image buffer. Computing time of extension sets(line 512~544) only takes about 7~8ms, but the assign work(line 548) takes about 25~26ms.
I'm running an experiment on a server machine with a quad-core Xeon X5355 processor running a linux system.
I try to control core voltage and frequency separately by writing to the msr IA32_PERF_CTL (0x199). I change the value of IA32_PERF_CTL using a "wtmsr" command and verify that its value has been changed using a "rdmsr" command. However, when I run "rdmsr 0x199" again a few seconds later, I find that the value of IA32_PERF_CTL is overwritten with its previous value. The value of IA32_PERF_STATUS does not represent my change either.
while I was trying to solve some particular multi-thread problem, it occurred to me that it could be solved more efficiently with special assistance from the CPU.
The situation is as follows: say one thread needs to block until the content of a particular 4-byte (or can be other size) location in the memory is changed. (It think the usefulness of this is very obvious and there is no need to give concrete examples to demonstrate it).
What are the currently available options:
I am using Roman Dementiev's code as a base and modifying it to determine if TSX operations are behaving according to expectations.
I am counting the number of times that xbegin() returns successful, the number of times it aborts and the number of times that fallback lock is used.
I see that the latency for a vpgatherdd is 18 clocks. I am thinking that other load type instructions can execute while the vpgatherdd is still working, since it only performs 8 loads while the processor can issue one load per clock. Is that correct?