Archived - Dipping into the Intel® RealSense™ Raw Data Stream

The Intel® RealSense™ SDK has been discontinued. No ongoing support or updates will be available.

Download PDF

1. Introduction

Developers wondering what they can achieve by implementing perceptual computing technology into their applications need look no further than the Intel RealSense SDK and accompanying samples and online resources. If you do decide to take “the dip,” you will discover a range of functionality that goes to the very heart of the technology and with it, the power to create some amazing new interface paradigms.

This article will explore this deeper dimension by looking at the different raw data streams, how to access them, and suggest possible ways to use them. By accessing this raw data directly, you will not only get a potential universe of metadata, you also get the fastest method of determining what the user is doing in the real world.

The Intel® RealSense™ 3D camera used for this article was the Bell Cliff 3D camera, and produces a variety of data streams, from the RGB image you might expect to the depth and infrared streams that might be new. Each stream has its idiosyncrasies and each of these will be discussed in the sections below. By the end of this article, you will have a good grasp of what streams are available and when you might want to use them.

As prerequisites, you should be familiar with C++ to follow the code examples and have a basic grasp of the Intel RealSense technology (or the earlier version known as Intel® Perceptual Computing SDK), though neither are not essential.

2. Why Is This Important

If you are only interested in implementing a basic gesture or face detection system, the algorithm modules in the Intel RealSense SDK will provide everything you need, and you won’t need to worry about raw data streams. The problem comes when you want functionality not present in the algorithm modules included with the SDK, at which point your application reaches an impasse unless an alternative is available.

The first question you should ask is what your application needs and whether these requirements can be met with the algorithm modules in the Intel RealSense SDK. If you require a cursor on the screen that tracks as the hand moves about, you may find that the hand/finger tracking module is sufficient. You should be able to find a sample provided with the SDK to quickly determine if the functionality meets your needs. If you find that the behavior demonstrated is not sufficient, you can then begin planning how you can use the raw data to solve your particular requirement.

For example, 2D gesture detection is currently provided, but what if you wanted to detect gestures from a set of 3D hands and determine additional information from what the user is doing with their hands. What if you wanted to record a high-speed stream of gestures and store them as a sequence instead of a snapshot? You would need to bypass the hand/finger system, which has its own processing overhead, and implement a technique that can act on and dynamically encode the real-time telemetry. More generally, you might encounter functional shortfalls and want a more direct solution to solve your specific application problem.

As another example, let’s say you are building an application that detects and interprets sign language and converts it to text for use over a teleconference session. The current functionality of the Intel RealSense SDK allows hand and finger tracking, but only single ones and not specifically tuned to the context of someone providing sign language through the camera. Your only course would be to develop your own gesture detection system that can quickly convert gestures into a sequence of hand and finger positions, and use pattern systems to recognize known signs and reconstruct the sentence. At present, the only way to do this would be to access the raw data depth stream using high-speed capture and translate the meaning on the fly.

Being able to write code to bridge the gap between the functionality you have and the functionality you want is critical, and the Intel RealSense SDK allows you to do that.

We are at a very early stage right now, and developers are still learning what can be done with this technology. By accessing raw data streams, you push the boundaries of what you can do, and it’s from these pioneering advances that true innovation is born.

3. Streams

The best way to learn about data streams is to see them for yourself. The best way to do that is to run the Raw Streams example, which you can find in the ‘bin’ folder after installing the Intel Realsense SDK:

\Intel\RSSDK\bin\win32\raw_streams.exe

The example is accompanied with full source code and project, which will become an invaluable resource later on. For now, simply running the executable and pressing the START button when the application launches will give you your first taste of a raw RGB color stream as shown in Figure 1.


Figure 1.A typical RGB color stream.

Now that you have waved to yourself, press the STOP button, click the Depth menu, and select 640x480x60. Press the START button again.


Figure 2.The filtered depth stream from the Intel® RealSense™ 3D camera.

As you can see in Figure 2, the image is quite different from the RGB color stream. What you are in fact seeing is a greyscale image that represents the distance of each pixel from the camera. White areas are closer and darker areas further away, with black registering as zero confidence or background distance.

By playing around in front of the camera, you will begin to appreciate how the camera could make some very quick decisions about what the user is doing. For example, it’s clear how the hands can be picked out of the scene, thanks to the thick black outline to separate it from the body and head further back in the scene.


Figure 3.Night Vision Anyone? Intel® RealSense™ 3D Camera sending a raw IR stream.

The final stream type may not be familiar to former Intel Perceptual Computing SDK developers, but in Figure 3 you can see that the IR menu offers the option of infrared camera stream. This stream is about as raw as you can get and offers stream read speeds significantly higher than typical monitor refresh rates.

You have the ability to initialize any and all of these streams to read simultaneously as your application requires, and for each stream you can choose the resolution and refresh rate needed. It is important to note that the final frame rate of incoming streams will be dependent on available bandwidth speed. For example if you tried to initialize an RGB stream at 60 fps, depth at 120 fps, and IR at 120 fps and stream them all in as a single synchronization, you would only get a refresh at the lowest of the refresh rates (60 fps), and then only as fast as the system can keep up.

The raw streams sample is great to get started, but does not allow you to combine streams and should only be used to get familiar with the types, resolutions, and refresh rates available for your camera. Bear in mind that the Intel RealSense SDK is designed to handle multiple types of 3D camera, so the resolutions you see in the sample may not be available on future cameras, making is vital that you do not hard code your stream resolutions for release applications.

4. Creating Streams and Accessing the Data

You can view the full source code to the raw streams sample by opening the following project in Visual Studio*:

\Intel\RSSDK\sample\raw_streams\raw_streams_vs20XX.sln

As the sample had to provide an easy to use UI and a full gamut of options, the source code is not very readable. It is often useful to strip away this ancillary code to get to key lines of code you will need to create, process, and then delete a stream from the camera. The code that follows is a stripped-down version of what is in the above project, but retains all the necessary features, even for the simplest Intel RealSense applications.

Your first two critical functions will be to initialize the Intel RealSense 3D camera and release it when the program ends. The code below shows this, and the details of the called functions will be explained in sequence.


int RSInit ( void )
{
	InitCommonControls();
	g_session=PXCSession::CreateInstance();
	if (!g_session) return 1;
	g_bConnected = false;
	g_RSThread = CreateThread(0,0,ThreadProc,g_pGlob->hWnd,0,0);
	Sleep(6000);
	if ( g_bConnected==false )
		return 1;
	else
		return 0;
}

void RSClose ( void )
{
	g_bConnected = false;
	WaitForSingleObject(g_RSThread,INFINITE);
}


Here we have the highest level functions for any raw stream application, essentially creating a session instance and a thread to run the stream handling code, then freeing the thread using the global g_bConnected flag. It is highly recommended you use a thread when sampling the streams, as this will allow your main application to run at any frame rate you require and not be bound by the refresh rates of the camera device. It also helps you spread your CPU activity across multiple cores, which helps improve overall application performance.

From the above code, the only line that you should now be interested in is the ThreadProc function, which will hold all the code responsible for handling streams. Before delving into this nest, it should be noted that the source code is not exhaustive so global declarations and non-critical sections have been intentionally removed for better readability. To find out how to declare these globals, simply refer to the original raw_streams project sample source code.


static DWORD WINAPI ThreadProc(LPVOID arg)
{
	CRITICAL_SECTION g_display_cs;
	InitializeCriticalSection(&g_depthdataCS);
	HWND hwndDlg=(HWND)arg;
	PopulateDevices(hwndDlg);
	PXCCapture::DeviceInfo dinfo=GetCheckedDevice(hwndDlg);
	PXCCapture::Device::StreamProfileSet profiles=GetProfileSet(hwndDlg);
	StreamSamples((HWND)arg, 
		&dinfo,
		&profiles,
		false, false, false,
		g_file
		);

	ReleaseDeviceAndCaptureManager();
	g_session->Release();
	DeleteCriticalSection(&g_depthdataCS);
	return 0;
}


It is essential to create a ‘critical section’ around the code to sample the data stream. Failure to do so in a threaded environment would mean two threads could potentially try to write to the same global at the same time, which is never desirable.

For those not too familiar with threading, this function is called and will not leave until the main thread (which created this thread) sets the g_bConnected to false (done elsewhere). If you consider the main function call in this code is StreamSamples, you can then see the remaining code above and below it are merely there to provide entrance and exit code. The first function of interest is PopulateDevices and is pretty much identical to that provided in the raw_streams project, essentially populating a list called g_devices with names of all available devices. If you are running an Intel RealSense 3D camera on an Ultrabook™ system, chances are you have two devices, with the second one being the built-in camera that ships with the Ultrabook. Buried in this function look at the raw code in these lines:


static const int ID_DEVICEX=21000;
static const int NDEVICES_MAX=100;
int c = ID_DEVICEX;
g_session->CreateImpl<PXCCapture>(g_devices[c],&g_capture);
g_device=g_capture->CreateDevice((c-ID_DEVICEX)%NDEVICES_MAX);


The code, constants, and globals are copied from the original source and could have been further reduced, but you can see the essential calls here are CreateImpl and CreateDevice. The result of this, if successful, is the Intel RealSense 3D camera pointer now stored in g_device.

With a valid device pointer, the rest of the initialization code goes smoothly. The StreamProfileSet function is simply a wrapper for this code:

g_device->QueryDeviceInfo(&dinfo);

The StreamProfileSet function is responsible for collecting all the stream types and resolutions you want to initialize and can be as simple or as complex as you need. It is highly recommended, however, that you enumerate through the list of valid types and resolutions as opposed to hard coding a fixed setting in case a future camera does not support it.


PXCCapture::Device::StreamProfileSet GetProfileSet(HWND hwndDlg) 
{
	PXCCapture::Device::StreamProfileSet profiles={};
	if (!g_device) return profiles;

	PXCCapture::DeviceInfo dinfo;
	g_device->QueryDeviceInfo(&dinfo);
	for (int s=0, mi=IDXM_DEVICE+1;s<PXCCapture::STREAM_LIMIT;s++) 
	{
		PXCCapture::StreamType st=PXCCapture::StreamTypeFromIndex(s);
		if (!(dinfo.streams&st)) continue;
		
		int id=ID_STREAM1X+s*NPROFILES_MAX;
		int nprofiles=g_device->QueryStreamProfileSetNum(st);
		for (int p=0;p<nprofiles;p++) 
		{
			if ( st==PXCCapture::StreamType::STREAM_TYPE_COLOR ) continue;
			if ( st==PXCCapture::StreamType::STREAM_TYPE_IR ) continue;
			if ( st==PXCCapture::StreamType::STREAM_TYPE_DEPTH && p==2 )
						{
				PXCCapture::Device::StreamProfileSet profiles1={};
				g_device->QueryStreamProfileSet(st, p, &profiles1);
				profiles[st]=profiles1[st];
						}
		}
		mi++;
	}

	return profiles;
}


QueryStreamProfileSet returns a lot of code that boils down to searching the available streams for a single depth stream and returning the profile. You can of course use your own conditions to find the streams you need, be they a specific resolution or refresh rate, so as long as you have fall-back criteria so that your application can proceed with a suitable stream format.

The final function and central block of code to access stream data is StreamSamples and, when stripped of its safety code and commentary, looks like this:


void StreamSamples(HWND hwndDlg, PXCCapture::DeviceInfo *dinfo, PXCCapture::Device::StreamProfileSet *profiles, bool synced, bool isRecord, bool isPlayback, pxcCHAR *file) 
{
	PXCSenseManager *pp=g_session->CreateSenseManager();
	pp->QueryCaptureManager()->FilterByDeviceInfo(dinfo);
	for (PXCCapture::StreamType st=PXCCapture::STREAM_TYPE_COLOR;st!=PXCCapture::STREAM_TYPE_ANY;st++) 
	{
		PXCCapture::Device::StreamProfile &profile=(*profiles)[st];
		if (!profile.imageInfo.format) continue;
		pp->EnableStream(st,profile.imageInfo.width, profile.imageInfo.height, profile.frameRate.max);
	}
	pp->QueryCaptureManager()->FilterByStreamProfiles(profiles);
	MyHandler handler(hwndDlg);
	if (pp->Init(&handler)>=PXC_STATUS_NO_ERROR) 
	{
		pp->QueryCaptureManager()->QueryDevice()->SetMirrorMode(PXCCapture::Device::MirrorMode::MIRROR_MODE_DISABLED);
		g_bConnected = true;
		for (int nframes=0;g_bConnected==true;nframes++) 
		{
			pxcStatus sts2=pp->AcquireFrame(synced);
			if (sts2<PXC_STATUS_NO_ERROR && sts2!=PXC_STATUS_DEVICE_LOST) break;
			if (sts>=PXC_STATUS_NO_ERROR) 
			{
				PXCCapture::Sample *sample = (PXCCapture::Sample*)pp->QuerySample();

				short invalids[1];
				invalids[0] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthSaturationValue();
				invalids[1] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthLowConfidenceValue();

				PXCImage::ImageInfo dinfo=sample->depth->QueryInfo();
				PXCImage::ImageData ddata;
				if (sample->depth->AcquireAccess(	PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_DEPTH,
     													&ddata)>=PXC_STATUS_NO_ERROR)
				{
					EnterCriticalSection(&g_depthdataCS);
					memset ( g_depthdata, 0, sizeof(g_depthdata) );
					short *dpixels=(short*)ddata.planes[0];
					int dpitch = ddata.pitches[0]/sizeof(short);
					for (int y = 0; y < (int)dinfo.height; y++) 
					{
						for (int x = 0; x < (int)dinfo.width; x++) 
						{
							short d = dpixels[y*dpitch+x];
							if (d == invalids[0] || d == invalids[1]) continue;
							g_depthdata[x][y] = d;
						}
					}
					LeaveCriticalSection(&g_depthdataCS);
					g_bDepthdatafilled = true;
				}
				sample->depth->ReleaseAccess(&ddata);
			}
			pp->ReleaseFrame();
		}
	} 
	pp->Close();
	pp->Release();
}


At first glance, it may seem a lot to take in, but when broken down, you will find the function is nothing more than some setup calls, a conditional loop, and final cleanup before returning to the ThreadProc function that called it. The main variable used throughout is called pp and is the Intel RealSense SDK manager pointer for our main activities. Note: as stated earlier, all error trapping has been removed for easier reading, but you should never create code that makes the assumption that any call to the Intel RealSense SDK will succeed.

The first key code line that will enable the stream(s) you are interested in looks like this:

pp->EnableStream(st,profile.imageInfo.width, profile.imageInfo.height, profile.frameRate.max);

This simple request switches on the stream type with a specific resolution and frame rate and tells the camera to get ready to send us this raw data. The next critical line activates the manager so it can start the busy process of retrieving data for us and looks like this:


MyHandler handler(hwndDlg);
if (pp->Init(&handler)>=PXC_STATUS_NO_ERROR


The class MyHandler is defined in the original raw_streams project and simply derives from the PXCSenseManager:Handler class. If this succeeds, you know the camera is activated and the stream data is on its way to you.

We now start a conditional loop that will iterate until some external force changes the loop condition, and within this loop we will be grabbing stream data one frame at a time. This is handled using the command called AcquireFrame.


	for (int nframes=0;g_bConnected==true;nframes++) 
	{
		pxcStatus sts2=pp->AcquireFrame(synced);


For as long as g_bConnected remains true we will do this as fast as we can in our separate thread we created for this purpose. Getting the actual data involves a few more key lines of code:


	PXCCapture::Sample *sample = (PXCCapture::Sample*)pp->QuerySample();

	short invalids[1];
	invalids[0] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthSaturationValue();
	invalids[1] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthLowConfidenceValue();

	PXCImage::ImageInfo dinfo=sample->depth->QueryInfo();
	PXCImage::ImageData ddata;
	if (sample->depth->AcquireAccess(	PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_DEPTH,
										&ddata)>=PXC_STATUS_NO_ERROR)


The first command gets a sample pointer from the manager and uses this to get a pointer to the actual data memory using the last command AcquireAccess. The intervening code performs two queries to ask the manager which values represent a ‘saturated’ pixel and a ‘low confidence’ pixel. Both these conditions can happen when retrieving depth data from the camera and ideally should be ignored when interpreting the data returned. The crucial result of this code is that the data structure ddata has now been filled with details that will enable us to directly access what in this example is the depth data. By changing the parameters you can gain access to the COLOR and IR stream data, if enabled.

This concludes the Intel RealSense SDK part of the code, from the very first initialization call to obtaining the pointer to the stream data. The rest of the code is a little more familiar and within the comfort zone of developers who have experience with image processing.


		EnterCriticalSection(&g_depthdataCS);
		memset ( g_depthdata, 0, sizeof(g_depthdata) );
		short *dpixels=(short*)ddata.planes[0];
		int dpitch = ddata.pitches[0]/sizeof(short);
		for (int y = 0; y < (int)dinfo.height; y++) 
		{
			for (int x = 0; x < (int)dinfo.width; x++) 
			{
				short d = dpixels[y*dpitch+x];
				if (d == invalids[0] || d == invalids[1]) continue;
				g_depthdata[x][y] = d;
			}
		}
		LeaveCriticalSection(&g_depthdataCS);


You will notice the critical section object we created earlier being used to lock our thread so that no other thread can access our globals. We do this so we can write to a global array and be assured that code from another part of our application won’t interfere. If you follow the nested loops, you will see that after locking the thread, we clear a global array called g_depthdata and proceed to fill it with values from the aforementioned ddata structure, which includes a pointer to the depth data. Within the nests, we also compare the depth pixel value with the two invalid values we determined earlier with the QueryDepthSaturationValue and QueryDepthLowConfidenceValue calls.

Once the stream data has been transferred to a global array, the thread can obtain the next frame from the stream data and your main primary thread can start analyzing this data and making decisions about it. You could even create a new worker thread to perform this analysis, allowing your application to run across three threads and making even better use of multicore architecture.

5. What To Do With Stream Data

Now you know how to obtain the stream data you want from the Intel RealSense 3D camera, you might be wondering what you can do with it. Of course, you can render it to the screen and admire the view, but you will soon need to convert that data into useful information and provide it to your application.

Just like snowflakes, no two implementations to use the raw stream data will be the same, but here are a few generic approaches to get you started mining the data. To reduce the amount of new code, we will use the above code as the template for the suggested examples below.

Find Nearest Point

You may want to find the closest point of an object in front of the camera, and you have just transferred the depth data from the stream to the global array of your main thread. You would create a nested loop to check each value within the array:


short bestvalue = 0;
int bestx = 0;
int besty = 0;
for ( int y = 0; y < (int)dinfo.height; y++) 
{
	for ( int x = 0; x < (int)dinfo.width; x++) 
	{
		short thisvalue = g_depthdata[x][y];
		if ( thisvalue > bestvalue )
		{
			bestvalue = thisvalue;
			bestx = x;
			besty = y;
		}
	}
}


Each time a closer value is found, it replaces the current best value found so far and records the X and Y coordinates at that point. By the time the loop has traversed through every pixel in the depth data, the final BESTX and BESTY variables will store the coordinate in the depth data closest to the camera.

Ignore Background Objects

You may want to identify foreground object shapes, but don’t want the application confused with objects further in the background like the user or people walking past.


short newshape[dinfo.height][dinfo.width];
memcpy(newshape,0,sizeof(newshape));
for ( int y = 0; y < (int)dinfo.height; y++) 
{
	for ( int x = 0; x < (int)dinfo.width; x++) 
	{
		short thisvalue = g_depthdata[x][y];
		if ( thisvalue>32000 && thisvalue<48000 )
		{
			newshape[x][y] = thisvalue;
		}
	}
}


By adding a condition as each pixel value is read and only transferring those that lie within a specific range, objects can be extracted from the depth data and transferred to a second array for further processing.

6. Tricks and Tips

Do’s

  • If you are trying out the samples for the first time and using an Ultrabook with a built-in camera, you may find the application choses the built-in camera instead of your Intel RealSense camera. Ensure that the Intel RealSense camera is connected properly and that your application is using the ‘Intel® RealSense™ 3D camera’ device. For more information on how to find a list of devices, look for references to ‘g_devices’ in this article.
  • Always try to use threads in your Intel RealSense application, as this will prevent your application of being bound by the frame rates of the Intel RealSense 3D camera stream and ultimately produce better performance on multi-core systems.

Don’ts

  • Do not hard code the device or profile settings when initializing your streams as future Intel RealSense 3D cameras may not support the one you have chosen. Always enumerate through the available devices and profiles and use search conditions to find a suitable one.
  • Avoid needless transfer of data to secondary arrays as there is a significant performance and memory hit of doing this every cycle. Instead, keep your data analysis as close to the original data read operation as possible.

7. Summary

With a good working knowledge of how to obtain the raw stream data from the Intel RealSense 3D camera, you can increase the capabilities of what can be done with this technology and open the door for innovative solutions to present-day challenges. We have already seen some great hands-free and perceptual applications from pioneering developers in this space, and as a group we have only just scratched the surface of what is possible.

It’s probable that most users still feel that computers are something to be prodded and poked into action, but we now have the capabilities for computers to open two eyes and watch our every move. Not in a sinister way, but akin to a friend providing a helping hand, guiding us to better experiences. It has been said that in a world of the blind, the one-eyed man is king. Is it not true then that we live in a world populated by blind computers, and so imagine the revolution should one of them, in the not too distant future, open its eyes on our world? As developers we are the architects of this revolution and together we can introduce a whole new paradigm—one in which computers are aware of their operators and empathetic to their situation.

About The Author

When not writing articles, Lee Bamber is the CEO of The Game Creators (http://www.thegamecreators.com), a British company that specializes in the development and distribution of game creation tools. Established in 1999, the company and surrounding community of game makers are responsible for many popular brands including Dark Basic, FPS Creator, FPSC Reloaded, and most recently App Game Kit (AGK).

Lee chronicles his daily life as a coder, complete with screen shots and the occasional video here: http://fpscreloaded.blogspot.co.uk

For more complete information about compiler optimizations, see our Optimization Notice.

9 comments

Top
lee@thegamecreators.com's picture

I would not use the LabVIEW example as the basis to cut and paste out code to grab the depth-only data, there is a better example that simply initializes the camera and pulls the color and depth, then renders them to a small window.  This is the example I used to then ignore the color feed and focus on the depth data.  It's been a while since I did that and the latest SDK might have changed the example title, but look for a small project that simply conveys the camera output to a window, no buttons or controls, and your task becomes infinitely easier!

Dear Sir,

I was working with Intel RealSense SR300 in LabVIEW software. LabVIEW can call "C" file or "C++ wrapped in C" using extern C function and compiling it into .dll.  

I don't know much about C++ but if I use the above-mentioned code to wrap it in C. How should I proceed in Visual Studio C++ ??

Please help me with code only to acquire depth stream.

Regards,

Kashish Dhal

lee@thegamecreators.com's picture

The failed experiment did not require code. Simply activating the second camera to send out IR was enough to see the depth data was being corrupted.  This was a while ago now so a solution may have been found elsewhere.  Look for IR syncing, which should be possible as I think the HTC Vive does something similar.

Hello Thank you for the Kind reply.

Could you please give an example code of how you streamed two cameras and targeted one object simultaneously? Would be very helpful :)

lee@thegamecreators.com's picture

I had the fortune of trying to use two RealSense cameras scanning a single target but the IR the camera throws out messed up the capture of the depth data. I have heard talk of syncing the cameras, but I have not been able to achieve this with the old depth camera hardware. Maybe the new generations have solved this, but it would be a good idea to have two as you could easily triangulate the 3D capture for error correction :)

Would it be possible to stream multiple real sense cameras on one szstem with this_

Cheers

Hi Lee,

in what units/scale are the depth values? You mention this range in the article:

 if ( thisvalue>32000 && thisvalue<48000 )

Are those values related to any physical measure?

Thanks! 

its really exciting!

mad\kafarrel's picture

One of our Black Belts in action - nice work Lee!

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.