Sorry if I post this question in at a wrong place, I don't know which forum should be...If I was wrong, pls direct me to the appropriate forum.
I would like to know what is Memory Bank, is there any technical document could describe memory bank as detail as possible on Intel site?
I have written a simulation code that offloads part of the simulation to MIC cards using #pragma offload. We have two MIC cards in our system. After running the simulation, I notice that the MIC cards have to be rebooted before the simulation can be rerun. Otherwise, when attempting to start a second run, I get an error -
offload error: cannot get device 0 handle (error code 2)
The simulation runs just fine after the cards are rebooted. Why is this happening?
I am working on a openmp enabled code . whenever I make a native run there is segmentation fault on MIC but code runs fine on XEON .
Arrays are 64 byte aligned and using #pragma vector aligned in a for loop interestingly this pragma causes seg fault .
Removing the pragma resolves the problem but want to know why is it happening ,I an also using __assume_aligned inside function containg the for loop and memory is allocated using _mm_malloc .
I wants to manually manage my code's the SIMD operations on MIC, and write the intrinsics below
_k_mask = _mm512_int2mask(0x7ff); // 0000 0111 1111 1111 _tempux2_512 = _mm512_mask_loadunpacklo_ps(_tempux2_512,_k_mask, &u_x[POSITION_INDEX_X(k,j,i-5)]); _tempux2_512 = _mm512_mask_loadunpackhi_ps(_tempux2_512,_k_mask, &u_x[POSITION_INDEX_X(k,j,i-5)]+16);
And the compiler icpc gives these error message.
I'm trying MIC and encountered a strange problem about how to allocate memory on MIC.
I write a example program like THIS:
I faced a strange problem. My program is running on MIC, and after I finished calculating an expression then assigned it to a variable, the problem cracked. The variable I assigned right now keeps zero and the assignment didn't work. The code looks like that:
wp[index] = p * w[l-1] printf("[X]%f w:%f p:%f should be %f\n", wp[index],w[l-1],o,w[l-1]*p);
The output is strange, 'wp[index]' is zero but w[l-1]*p isn't. The final result is also zero, so I don't think it's only a block of print.
Applications in data centers process huge workloads every day. Many of them are CPU intensive, disk I/O intensive, network I/O intensive or a combination thereof. Maintaining a data center is challenging because the amount of work being run, and data being processed is getting larger, which may result in bottlenecks. When an application has a bottleneck (either CPU, disk I/O or network), the effects may result in degradation of the whole system’s performance.
Suppose we have a system that consists of a host processor, FPGA card and a Xeon Phi processor connected with a PCI Express fabric. Data is acquired at FPGA card and should be processed at Xeon Phi (one or more nodes) directly.
Is there any simple way to feed data from FPGA card directly to Xeon Phi's memory via PCIe?