Cilk Plus and Graphics Application

Cilk Plus and Graphics Application

Hi,

I am trying to parallelize a c++ raytracing engine which renders pixels to screen using cilk plus. How can I convert a windows application to a console application so that I can use the cilk keywords in cilk_main() or is there perhaps another alternative way to using cilk keywords with the WinMain() method. I tried using OpenGL calls but am having trouble with that, you can see my post on the link http://stackoverflow.com/questions/23289466/how-do-i-display-the-result-of-a-raytracing-engine-in-using-opengl

I will appreciate any help I can get

Thanks

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

cilk_main is an artifact of Cilk++, which is no longer supported. You should be using Cilk Plus which is supporte by both the Intel Composer XE compiler, or in GCC.

Regardless, the Cilk keyworks are fully supported in both Windows and console applications. My recommendation is to get your program working serially first. You can include cilk_stub.h to macro away the Cilk keywords, or define the macros yourself. They are

#define cilk_spawn
#define cilk_sync
#define cilk_for for

However, I don't see any use of the Cilk keywords in the sample code you pointed to.  Unfortunately I don't know OpenGL, so I can't help you with that.

   - Barry

P.S. If you want an example of a Windows application that uses Cilk, look at the QuickDemo application which uses MFC.

Thanks Barry,

I went through your suggestions and made the appropriate amends to my code. The code you saw in the link provided was just an excerpt where OpenGL calls were made and with your suggestion I have put aside OpenGL.

I have the actual serial window program running and I included the #include cilk_stub.h to my header files and I was able to use the cilk keywords. The problem now is after using cilk_for and cilk_sync at various iterative points in my code (careful with race conditions), I couldn't see any appreciable speed-up. Am I missing something? Here is the code with the raytracing engine originally by Jacco Bikker where I included the cilk plus keywords to parallelize. You also find attached the entire visual studio solution file:

/ -----------------------------------------------------------
// raytracer.cpp
// 2004 - Jacco Bikker - jacco@bik5.com - www.bik5.com -   <><
// -----------------------------------------------------------

#include "raytracer.h"
#include "scene.h"
#include "common.h"
#include "windows.h"
#include "winbase.h"
#include "cilk.h"
#include "cilkview.h"

namespace Raytracer {

Ray::Ray( vector3& a_Origin, vector3& a_Dir ) : 
	m_Origin( a_Origin ), 
	m_Direction( a_Dir )
{
}

Engine::Engine()
{
	m_Scene = new Scene();
}

Engine::~Engine()
{
	delete m_Scene;
}

// -----------------------------------------------------------
// Engine::SetTarget
// Sets the render target canvas
// -----------------------------------------------------------
void Engine::SetTarget( Pixel* a_Dest, int a_Width, int a_Height )
{
	// set pixel buffer address & size
	m_Dest = a_Dest;
	m_Width = a_Width;
	m_Height = a_Height;
}

// -----------------------------------------------------------
// Engine::Raytrace
// Naive ray tracing: Intersects the ray with every primitive
// in the scene to determine the closest intersection
// -----------------------------------------------------------
Primitive* Engine::Raytrace( Ray& a_Ray, Color& a_Acc, int a_Depth, float a_RIndex, float& a_Dist )
{
	if (a_Depth > TRACEDEPTH) return 0;
	// trace primary ray
	a_Dist = 1000000.0f;
	vector3 pi;
	Primitive* prim = 0;
	int result;
	// find the nearest intersection
	for ( int s = 0; s < m_Scene->GetNrPrimitives(); s++ )
	{
		Primitive* pr = m_Scene->GetPrimitive( s );
		int res;
		if (res = pr->Intersect( a_Ray, a_Dist )) 
		{
			prim = pr;
			result = res; // 0 = miss, 1 = hit, -1 = hit from inside primitive
		}
	}
	// no hit, terminate ray
	if (!prim) return 0;
		// ---- Handle intersection ----
	if (prim) {
		// Update statistics
		//m_statisticsDataStructCpp->rayIntersectionsCount[0]++;
		if (prim->IsLight())
		{
			// we hit a light, stop tracing
			a_Acc =Color( 1.0f, 1.0f, 1.0f );
		}
		else
		{
			// determine color at point of intersection
			pi = a_Ray.GetOrigin() + a_Ray.GetDirection() * a_Dist;
			// trace lights
			cilk_for ( int l = 0; l < m_Scene->GetNrPrimitives(); l++ )
			{
				Primitive* p = m_Scene->GetPrimitive( l );

				if (p->IsLight()) 
				{
					Primitive* light = p;
						// handle point light source
						float shade = 1.0f;
						if (light->GetType() == Primitive::SPHERE)
						{
							vector3 L = ((Sphere*)light)->GetCentre() - pi;
							float tdist = LENGTH( L );
							NORMALIZE(L);
							vector3 TempVector3(pi + L * EPSILON);
							Ray r = Ray( TempVector3, L );
							for ( int s = 0; s < m_Scene->GetNrPrimitives(); s++ )
							{
								Primitive* pr = m_Scene->GetPrimitive( s );						
								if ((pr != light) && (pr->Intersect( r, tdist )))
								{
									shade = 0;
									break;
								}
							}
						}
						if (shade > 0)
						{
							vector3 L = ((Sphere*)light)->GetCentre() - pi;
							NORMALIZE( L );
							vector3 N = prim->GetNormal( pi );
							// determine diffuse component
							if (prim->GetMaterial()->GetDiffuse() > 0)
							{
								float dot = DOT( L, N );
								if (dot > 0)
								{
									float diff = dot * prim->GetMaterial()->GetDiffuse() * shade;
									// add diffuse component to ray color
									Color ncol = diff * prim->GetMaterial()->GetColor();
									// Scale back the color, if necessary
									if (ncol.r > 1.0f || ncol.g > 1.0f || ncol.b > 1.0f)
									{
										float max = 1.0f;
										if (ncol.r > max) max = ncol.r;
										if (ncol.g > max) max = ncol.g;
										if (ncol.b > max) max = ncol.b;
										ncol *= 1.0f/max;
									}
									a_Acc += ncol;
								}
							}
							// determine specular component
							if (prim->GetMaterial()->GetSpecular() > 0)
							{
								// point light source: sample once for specular highlight
								vector3 V = a_Ray.GetDirection();
								vector3 R = L - 2.0f * DOT( L, N ) * N;
								float dot = DOT( V, R );
								if (dot > 0)
								{
									float spec = powf( dot, 20 ) * prim->GetMaterial()->GetSpecular() * shade;
									// add specular component to ray color
									Color ncol = spec * light->GetMaterial()->GetColor();
									// Adjust specular component to be within 0 <-> 1.0
									if (ncol.r > 1.0f) ncol.r = 1.0f; else if (ncol.r < 0.0f) ncol.r = 0.0f;
									if (ncol.g > 1.0f) ncol.g = 1.0f; else if (ncol.g < 0.0f) ncol.g = 0.0f;
									if (ncol.b > 1.0f) ncol.b = 1.0f; else if (ncol.b < 0.0f) ncol.b = 0.0f;
									a_Acc += ncol;								
								}
							}
						}
				}
			}
			cilk_sync;
			// Scale back the local color contribution, if necessary
			if (a_Acc.r > 1.0f || a_Acc.g > 1.0f || a_Acc.b > 1.0f)
			{
				float max = 1.0f;
				if (a_Acc.r > max) max = a_Acc.r;
				if (a_Acc.g > max) max = a_Acc.g;
				if (a_Acc.b > max) max = a_Acc.b;
				a_Acc *= 1.0f/max;
			}
			// -- Calculate reflection --
			float refl = prim->GetMaterial()->GetReflection();
			if ((refl > 0.0f) && (a_Depth < TRACEDEPTH))
			{
				vector3 N = prim->GetNormal( pi );
				vector3 R = a_Ray.GetDirection() - 2.0f * DOT( a_Ray.GetDirection(), N ) * N;
				Color rcol( 0.0f, 0.0f, 0.0f );
				float dist;
				vector3 TempVector3(pi + R * EPSILON);
				Ray TempRay(TempVector3, R);
				Raytrace( TempRay, rcol, a_Depth + 1, a_RIndex, dist );
				a_Acc += refl * rcol;
			}
		}
	}
	else {
		// -- No intersection occured; stop tracing --
		// Update statistics
		//m_statisticsDataStructCpp->rayMissesCount[0]++;
		// Do nothing
	}
	
	// return pointer to primitive hit by primary ray
	return prim;
}

// -----------------------------------------------------------
// Engine::InitRender
// Initializes the renderer, by resetting the line / tile
// counters and precalculating some values
// -----------------------------------------------------------
void Engine::InitRender()
{
	// set firts line to draw to
	m_CurrLine = 20;
	// set pixel buffer address of first pixel
	m_PPos = 20 * m_Width;
	// screen plane in world space coordinates
	m_WX1 = -4, m_WX2 = 4, m_WY1 = m_SY = 3, m_WY2 = -3;
	// calculate deltas for interpolation
	m_DX = (m_WX2 - m_WX1) / m_Width;
	m_DY = (m_WY2 - m_WY1) / m_Height;
	m_SY += 20 * m_DY;
	// allocate space to store pointers to primitives for previous line
	m_LastRow = new Primitive*[m_Width];
	memset( m_LastRow, 0, m_Width * 4 );
	// statistics
	m_RayMissesCount = 0;
	m_RayIntersectionsCount = 0;
}

void Engine::PreRender() {
	// Prepare linearized data
	m_spheres.clear();
	m_planes.clear();
	Sphere* tempSphere;
	PlanePrim* tempPlane;
	int sphereCount;	
	int planesCount;
	Primitive* prim;
	cilk_for (int i = 0; i < m_Scene->GetNrPrimitives(); i++) {
		prim = m_Scene->GetPrimitive(i);

		tempSphere = dynamic_cast<Sphere*> (prim);
		if (dynamic_cast<Sphere*> (prim) != 0){
			m_spheres.push_back(tempSphere);
//			continue;
		}
		tempPlane = dynamic_cast<PlanePrim*> (prim);
		if (dynamic_cast<PlanePrim*> (prim) != 0)
			m_planes.push_back(tempPlane);
	}
	cilk_sync;
	// Spheres
	m_sphereDataStructCpp = new SphereDataStructCpp(m_spheres.size());
	
	cilk_for (int i = 0; i < m_spheres.size(); i++)	{
		tempSphere = m_spheres[i];
		// Geometric Data
		m_sphereDataStructCpp->centerX[i] = tempSphere->GetCentre().x;
		m_sphereDataStructCpp->centerY[i] = tempSphere->GetCentre().y;
		m_sphereDataStructCpp->centerZ[i] = tempSphere->GetCentre().z;
		m_sphereDataStructCpp->recRadius[i] = tempSphere->GetRecRadius();
		m_sphereDataStructCpp->sqRadius[i] = tempSphere->GetSqRadius();
		// Material Data
		m_sphereDataStructCpp->diffuse[i] = tempSphere->GetMaterial()->GetDiffuse();
		m_sphereDataStructCpp->specular[i] = tempSphere->GetMaterial()->GetSpecular();
		m_sphereDataStructCpp->reflection[i] = tempSphere->GetMaterial()->GetReflection();
		m_sphereDataStructCpp->refraction[i] = tempSphere->GetMaterial()->GetRefraction();
		m_sphereDataStructCpp->refrIndex[i] = tempSphere->GetMaterial()->GetRefrIndex();
		// Color Data
		m_sphereDataStructCpp->red[i] = tempSphere->GetMaterial()->GetColor().r;
		m_sphereDataStructCpp->green[i] = tempSphere->GetMaterial()->GetColor().g;
		m_sphereDataStructCpp->blue[i] = tempSphere->GetMaterial()->GetColor().b;
		// Light?
		m_sphereDataStructCpp->isLight[i] = tempSphere->IsLight();
	}
	cilk_sync;
	// Planes

	m_planeDataStructCpp = new PlaneDataStructCpp(m_planes.size());
	
	cilk_for (int i = 0; i < m_planes.size(); i++)	{
		tempPlane = m_planes[i];
		// Geometric data
		m_planeDataStructCpp->normalX[i] = tempPlane->GetNormal().x;
		m_planeDataStructCpp->normalY[i] = tempPlane->GetNormal().y;
		m_planeDataStructCpp->normalZ[i] = tempPlane->GetNormal().z;
		m_planeDataStructCpp->d[i] = tempPlane->GetD();
		// Material data
		m_planeDataStructCpp->diffuse[i] = tempPlane->GetMaterial()->GetDiffuse();
		m_planeDataStructCpp->specular[i] = tempPlane->GetMaterial()->GetSpecular();
		m_planeDataStructCpp->reflection[i] = tempPlane->GetMaterial()->GetReflection();
		m_planeDataStructCpp->refraction[i] = tempPlane->GetMaterial()->GetRefraction();
		m_planeDataStructCpp->refrIndex[i] = tempPlane->GetMaterial()->GetRefrIndex();
		// Color data
		m_planeDataStructCpp->red[i] = tempPlane->GetMaterial()->GetColor().r;
		m_planeDataStructCpp->green[i] = tempPlane->GetMaterial()->GetColor().g;
		m_planeDataStructCpp->blue[i] = tempPlane->GetMaterial()->GetColor().b;
		// Light data
		m_planeDataStructCpp->isLight[i] = tempPlane->IsLight();
	}
	cilk_sync;
	// Statistics
	m_RayMissesCount = 0;
	m_RayIntersectionsCount = 0;
	
	m_statisticsDataStructCpp = new StatisticsDataStructCpp((size_t)m_CohortPSize);
	
		
}

// -----------------------------------------------------------
// Engine::Render
// Fires rays in the scene one scanline at a time, from left
// to right
// -----------------------------------------------------------
bool Engine::Render()
{
	// render scene
	vector3 o( 0, 0, -5 );
	// initialize timer
	int msecs = GetTickCount();
	// reset last found primitive pointer
	Primitive* lastprim = 0;
	// render remaining lines
	cilk_for ( int y = m_CurrLine; y < (m_Height - 20); y++ )
	{
		m_SX = m_WX1;
		// render pixels for current line
		for ( int x = 0; x < m_Width; x++ )
		{
			// fire primary ray
			Color acc( 0, 0, 0 );
			vector3 dir = vector3( m_SX, m_SY, 0 ) - o;
			NORMALIZE( dir );
			Ray r( o, dir );
			float dist;
			Primitive* prim = Raytrace( r, acc, 1, 1.0f, dist );
			int red = (int)(acc.r * 256);
			int green = (int)(acc.g * 256);
			int blue = (int)(acc.b * 256);
			if (red > 255) red = 255;
			if (green > 255) green = 255;
			if (blue > 255) blue = 255;
			m_Dest[m_PPos++] = (red << 16) + (green << 8) + blue;
			m_SX += m_DX;
		}
		m_SY += m_DY;
		// see if we've been working to long already
		if ((GetTickCount() - msecs) > 100) 
		{
			// return control to windows so the screen gets updated
			m_CurrLine = y + 1;
			return false;
		}
		
	}
	cilk_sync;
	// all done
	return true;
}

}; // namespace Raytracer

 

cilk_stub.h stubs out the parallelism. It literally contains the macros I gave you. So it's converting your program to a serial program.

I've got the source file, but when I try to compile it, Visual Studio reports that it cannot find raytracer.h, scene.h and common.h

   - Barry

Hi Barry,

You can now find attached another copy of the visual studio files should be alright. 

Yak

Attachments: 

AttachmentSize
Downloadapplication/zip RayTracerCilkPlus.zip5.85 MB

OK, a couple of comments.

First, you don't need a cilk_sync after a cilk_for. It's automatically provided by the divide-and-conquer loop implementation. You only need a cilk_sync if you've got a cilk_spawn in the function. One is automatically put in a function before the function returns.

Second, I noticed that you're checking whether you've taken too long to render the image in Engine::Render() and attempting to abort out of the loop. If UI refresh frequency is an issue, that's an indication that you need to move the calculation into a worker thread and send messages to the UI thread that data is available for display. Again, the QuickDemo application is an example of how to do this.

Next, you're still linking against the old cilk++ runtime library.

I removed the timing check, and was able to build. Running the application showed trash. Using a single worker gives the correct picture, which suggests that you've got races. I modified the program to only go through the render/draw loop once, and ran the application under Cilkscreen, which reports a bunch of races. Remember that this is a Windows application, so stdout and stderr go to /dev/null, so you must use the --report option to specify a report file to get the report. For a console application, the default of writing to stderr work.

Some races that I can point out easily:

void Engine::PreRender() {
    	// Prepare linearized data
    	m_spheres.clear();
    	m_planes.clear();
    	Sphere* tempSphere;
    	PlanePrim* tempPlane;
    	int sphereCount;	
    	int planesCount;
    	Primitive* prim;
	    cilk_for (int i = 0; i < m_Scene->GetNrPrimitives(); i++) {
        		prim = m_Scene->GetPrimitive(i);
        		tempSphere = dynamic_cast<Sphere*> (prim);
        		if (dynamic_cast<Sphere*> (prim) != 0){
            			m_spheres.push_back(tempSphere);
           //			continue;
        		}
    		tempPlane = dynamic_cast<PlanePrim*> (prim);
		    if (dynamic_cast<PlanePrim*> (prim) != 0)
        			m_planes.push_back(tempPlane);
    	}
:
:
}

Note the use of "prim" inside the cilk_for loop. The variable is declared *outside* the loop, so multiple strands will be racing to access it. You have the same with "tempSphere" and "tempPlane." Then there's the issue of pushing elements onto an STL vector. STL is not thread safe. You can possibly replace the STL vectors with reducer_vectors.

I highly recommend you use a race detector like Amplifier or Cilkscreen to track these down.

   - Barry

Hi Barry,

I tried to download the latest version of cilk plus but it seems like it doesn't come like the old version which you just install and add all the include and lib files for visual studio.

Secondly, I tried to run cilkscreen in visual studio 2008 with the cilk version I have been running but it returns something like "no cilk keywords found" even after including the keywords.

How can I get the latest version of the intel cilk plus to use on visual studio 2008. I downloaded the Intel Cilk tools which contains Cilkscreen and Cilkview but I need and updated cilk Plus itself. 

Another option I was considering is to copy  the updated header files of Cilk.h, cilk_stub.h I found in one of the cilk runtime examples I found on the cilk webpage. I don't know if replacing the header files in my old cilk ++ version will work ok.

Thanks

Yak

You need a copy of Intel C++ Composer. There are pointers to the various packages it's available in at http://www.cilkplus.org/which-license . See the section that's titled "Intel Commercial Releases."

   - Barry

One more thing. You need to check whether OpenGL is thread safe. If not, you'll need to gather your information and pass it to a UI thread which is charged with being the one thread that touches the UI.  QuickDemo works this way because MFC is not thread safe.

   - Barry

Hi Barry,

Sorry for the late reply. 

I finally was able to download an evaluation copy of Intel C++ Studio XE 2013 and reproduce the skewed result you got when you ran the code.

I used the Intel Advisor and the Suitability tools in order to locate where best to parallelize in the code. The problem am having now is how to make the prim  a reducer since it is a pixel pointer. The prim appears in the pre-render routine and is a primitive type

Secondly,  how do I also convert m_Dest[ ]   in the line of code below into a reducer( I tried using reducer_append<> but did get through)

m_Dest[j] = (b << 16) + (g << 8) + r; 

 

Thirdly, when I introduced cilk_for in the outer loop as shown below I got a return message that says something like cilk_for does not support boolean yet. So I opted to introduce cilk_for in the inner loop.

for (; m_CohortPPos < m_LastPPos; m_CohortPPos += m_CohortPSize) 
		{                                                               
			if ((m_LastPPos - m_CohortPPos) < m_CohortPSize)                 
				m_CohortPSize = (m_LastPPos - m_CohortPPos);
			cilk_for (j = m_CohortPPos; j < (m_CohortPPos + m_CohortPSize); j++)
			{		
				// calculate screen world x and y
				m_SX = j % m_Width * m_DX + m_WX1;
				m_SY = j / m_Width * m_DY + m_WY1;
				// fire primary rays

				Color acc( 0.0f, 0.0f, 0.0f );
				vector3 dir = vector3( m_SX, m_SY, 0 ) - o;
				NORMALIZE( dir );
				Ray r( o, dir );
				float dist;
				Primitive* prim = Raytrace( r, acc, 1, m_maxTraceDepth, 1.0f, dist);
				int red, green, blue;
				
				red = (int)(acc.r * 256);
				green = (int)(acc.g * 256);
				blue = (int)(acc.b * 256);
				
				if (red > 255) red = 255;
				if (green > 255) green = 255;
				if (blue > 255) blue = 255;
				
				m_Dest[j] = (blue << 16) + (green << 8) + red;
			}
			// see if we've been working to long already
			if ((GetTickCount() - msecs) > 100) 
			{
				// return control to windows so the screen gets updated
				m_CohortPPos += m_CohortPSize;
				return false;
			}
		}

Please help me out am really trying to get a grasp of how cilk plus can ease parallelism in raytracing and then compare the performance results with an OpenCL version of the same code.

Thanks

Yaknan

The first one's easy. If you move the declaration of prim in Engine::PreRender() into the initial cilk_for loop, you'll get a separate instanciation in each strand, which should do exactly what you want. It's never used outside of the loop, so there's no reason not to move it.  The same is going to apply to tempSphere. It's also used in the second cilk_for loop, but you can easily redeclare it inside that loop as well.

The thing to keep in mind is that the implementation of a cilk_for loop is do convert the loop body to a lambda function. You want to move everything possible into the loop body so it's not a global reference.

I don't claim to understand the code, but it appears that m_Dest is a pointer to the pixels that make up your surface. This means that you have to guarantee that only one strand writes to it. Which means that you need to have a per-strand instance of m_PPos - so it has to move out of being a member variable. You need to calculate it based on the line and width.

m_CohortPPos is declared in the version I've got of the code, but not used anywhere. But as a general rule, cilk_for loops require that you provide the initialization of the loop control variable. It does not support initialization outside the loop, since we must know that range of the loop in order to execute it in parallel.

    - Barry

Hi Barry,

I tried all your suggestions but am still unclear about the suggestion you made in your third paragraph about m_Dest. You are right that m_Dest is a pointer to the pixels that make up the surface. Also after making the changes to the Pre-Render method I did not get any significant speed-up although it work just fine.

My problem is which reducer method would be most appropriate for m_Dest or how do I ensure that only one strand writes to it. Th advisor shows significant time taken in this loop and I know that race conditions exist with m_Dest but I don't know how to go about it. Please assist me! i have attached the raytracer.cpp and raytracer.h where i made all the changes.

 

Yaknan

 

Attachments: 

AttachmentSize
Downloadtext/x-c++src raytracer.cpp12.67 KB
Downloadtext/x-chdr raytracer.h2.48 KB

The attached version of raytracer.cpp doesn't work.

I replaced the "cilk_for" with a "for" in Engine::Render, and commented out the cilk_sync at then end of the loop. (FYI, you do not need a cilk_sync after a cilk_for loop. All strands within the loop have completed when the loop exits.) Then I set a breakpoint on the assignment to m_SX at line 346. The value of "j" and "m_CohortPPos" is -1684284902. Using this to index into m_Dest results in an access violation.

Note that the use of m_SX and m_SY is going to be a race. I don't see any other uses of m_SX other than in this routine, so it should not be a class member variable. Instead make it a local declared within the loop. m_SY is used in Engine::InitRender in a calculation I don't understand, but it's value is then overriden is Engine::Render. So it should also a local declared in the loop.

    - Barry

Leave a Comment

Please sign in to add a comment. Not a member? Join today