_Cilk_shared and STL

_Cilk_shared and STL

I am porting a large C++ application to MIC, and I would like to use _Cilk_shared to transport the data between host and Xeon Phi. Of course, I would like to avoid rewriting the whole code, so what I do is this in the .cpp file that implements myClass:

#pragma offload_attribute(push, _Cilk_shared)
#include <myClass.h>
#pragma offload_attribute(pop, _Cilk_shared)
...
void myClass::myMethod(){
std::vector<bool> localData;
....

myClass.h contains this:

#include <vector>
class myClass {
....
void myMethod();
std::vector<float> m_floatData;
};

Now I have a dilemma:

Is there any simple solution to this?

Georg

 

 

 

 

 

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,

I am curious as to why you would want localData to be _Cilk_shared? Could you please share how you intend to use the localData?

-Sumedh

Hi Georg,

I had a similar problem in a code that I was porting recently, only my code was using valarrays rather than vectors to hold the data. It was my understanding that it is impossible to make a class _Cilk_shared if some of the members are non-_Cilk_Shared objects, like std::vectors. However, even if you made this happen (e.g., in a native application), there are practical disadvantages of using vectors or valarrays in Xeon Phi code:

a) With data in vectors or valarrays, you have no control over the alignment of data in memory. This can result in moderate to severe performance penalties on Xeon Phi coprocessors.

b) When you do seemingly harmless operations with vectors, such as the creation of a temporary vector on a function stack, or using push_back, you occasinally trigger dynamic memory allocation in the vector class. This operation is inherently sequential and may have a really bad effect on the performance on the coprocessor (it was quite severe in my application).

So, the way that I see it, there are three possible ways to port this code:

1) Implement your own class MyVector analogous to std::vector, and ensure that it allocates data on a 64-byte boundary, and that it does not use _mm_malloc() when you don't need it.

2) Create a derived class myClassPort : public myClass. The constructor of myClassPort should copy all data in from the vectors of myClass into arrays of float. This gives you control over alignment and, at the same time eliminates the overhead of abstraction in the performance-critical part. After that, you can use the explicit offload model (with "#pragma offload") to launch calculations on the coprocessor. This is the method that I chose for my code, because I wanted the best control over data allocation and transport.

3) Of course, you can also compile a native application without any code changes, but I don't know if this is a good option for your application. It was not for mine.

Andrey

Quote:

Sumedh Naik (Intel) wrote:

I am curious as to why you would want localData to be _Cilk_shared? Could you please share how you intend to use the localData?

I dont want localData t be _Cilk_shared. But because member m_floatData is part of a _Cilk_shared class, std::vector also needs to be _Cilk_shared, and therefore I cannot avoid localData to be _Cilk_shared. Or am I missing something here?

Georg 

Hi Goerg, 

I now understand the issue. In this case, instead of marking the entire class as shared, you use a shared allocator defined in offload.h to create a shared vector object. Here is an example: 

#include <vector>

_Cilk_shared class myClass {

....

void myMethod();

_Cilk_shared std::vector<float, __offload::shared_allocator<int>> _Cilk_shared m_floatData;

};

Here is another example, I found which instantiates and manipulates shared versions of C++ STL vectors. 

#include <vector>
#include <offload.h>
#include <stdio.h>

 

using namespace std;

 

typedef vector<int, __offload::shared_allocator<int> >

 

shared_vec_int;

_Cilk_shared shared_vec_int * _Cilk_shared v;

 

_Cilk_shared int test_result() {

  int result = 1;

   for (int i = 0; i < 5; i++) {
      if ((*v)[i] != i) {
         result = 0;
      }
   }

   return result;
}

 

int main() {

 

   int result;

 

   v = new (_Offload_shared_malloc(sizeof(vector<int>))) _Cilk_shared vector<int,__offload::shared_allocator<int>>(5);

 

   for (int i = 0; i < 5; i++) {
      (*v)[i] = i;
   }

 

   result = _Cilk_offload test_result();

 

   if (result != 1)
      printf("Failed\n");
   else
      printf("Passed\n");

 

   return 0;
}

Hi Goerg, 

I now understand the issue. In this case, instead of marking the entire class as shared, you can use a shared allocator (defined in offload.h) to create a shared vector object. Here is an example: 

#include <vector>

_Cilk_shared class myClass {

....

void myMethod();

_Cilk_shared std::vector<float, __offload::shared_allocator<int>> _Cilk_shared m_floatData;

};

This is another example I found that instantiates and manipulates shared versions of C++ STL vectors. 

#include <vector>
#include <offload.h>
#include <stdio.h>

using namespace std;

typedef vector<int, __offload::shared_allocator<int> >

shared_vec_int;

_Cilk_shared shared_vec_int * _Cilk_shared v;

_Cilk_shared int test_result() {

  int result = 1;

   for (int i = 0; i < 5; i++) {
      if ((*v)[i] != i) {
         result = 0;
      }
   }

   return result;
}

int main() {

   int result;

   v = new (_Offload_shared_malloc(sizeof(vector<int>))) _Cilk_shared vector<int,__offload::shared_allocator<int>>(5);

   for (int i = 0; i < 5; i++) {
      (*v)[i] = i;
   }

   result = _Cilk_offload test_result();

   if (result != 1)
      printf("Failed\n");
   else
      printf("Passed\n");

   return 0;
}

I hope this helps. 

-Sumedh

Quote:

Sumedh Naik (Intel) wrote:

...I now understand the issue. In this case, instead of marking the entire class as shared, you can use a shared allocator (defined in offload.h) to create a shared vector object. Here is an example: 

....

Okay, I think I understand. Let me do a few tests...

Georg

where is _Cilk_shared? what header file do I need to include to use it. I can see other cilk contructs when I begin to type in visual studio (intellisense) but not _cilk_shared.

There is no header needed for the keywords. You might include <offload.h> to use other aspects of the shared offload model; however, I believe we may be lacking defines in the <cilk/cilk.h> to enable the intellisense. I'm checking w/others about this.

Our IDE integration Developer clarified the keyword highlighting and IntelliSense support.

Currently, _Cilk_for, _Cilk_spawn, _Cilk_sync are highlighted as compiler keywords in the C++ editor and that is the extent of the support that we can provide for Intel C++-specific keywords in the Visual Studio editor. There is no auto-completion or any other IntelliSense support for  _Cilk_for, _Cilk_spawn, _Cilk_sync because the Visual C++ IntelliSense is not extensible, unfortunately. The contents of <cilk/cilk.h> are not relevant to IntelliSense either; however, this header triggers auto-completion for cilk_spawn, cilk_sync, cilk_for when it is included.

Further, we do not currently highlight the keywords from the offload Virtual shared model (_Cilk_shared, _Cilk_offload, _Cilk_offload_to) so I submitted a feature enhancement (see internal tracking id noted below) to have those highlighted similar to the other _Cilk keywords.

(Internal tracking id: DPD200255317)

Login to leave a comment.