Working with Multiple Modalities

Top  Previous  Next

The SDK includes multiple algorithms (modalities or modules, used interchangeably) and each extends your application's interaction with the user. When enabling multiple modalities in your application, you must observe the following limitations/considerations:

Stream Resolution and Frame Rate

Multiple modalities must agree on the stream resolutions and frame rates. For example, a face tracking module that needs color 1920x1080x30fps cannot work together with the object tracking module that works only with color 640x480x30fps. An agreed upon stream configuration must be set for the two modules to function together. The configuration is stream based thus a module that uses the color stream does not conflict with a module that works with a depth stream.

The following table shows the modules and their available configurations:

Camera F200

Color Resolution

Depth Resolution

width

320

640

640

960

1280

1920

320

640

640

height

240

360

480

540

720

1080

240

480

240

Face Tracking (2D)

 

 

X

 

 

 

NA

NA

NA

Face Tracking (3D)

 

X

X

X

X

X

X

X

X

Hand Tracking

NA

NA

NA

NA

NA

NA

X

X

X

User segmentation

X

X

X

X

X

 

 

X

 

Emotion Detection

X

 

X

 

 

 

NA

NA

NA

Object Tracking (2D)

X

X

X

X

X

 

NA

NA

NA

Object Tracking (3D)

X

X

X

X

X

 

 

X

 

Touchless Controller

NA

NA

NA

NA

NA

NA

X

X

X

3D Scanning

 

 

 

X

 

 

 

X

 

Camera R200

Color Resolution

Depth Resolution

width

320

 

640

 

1280

1920

320

480

628

height

240

480

720

720

240

360

468

Face Tracking

 

X

X

 

 

X

 

Enhanced Photography

X

X

X

X

 

 

 

Scene Perception

X

X

 

 

X

X

X

3D Scanning

 

X

 

 

 

X

 

This table is for reference only and may change over SDK releases. You should not hard code the settings. Instead, use SenseManager to auto-negotiate a working configuration.

The PXC[M]SenseManager interface auto-negotiates the stream configuration when you enable multiple modalities. The Init function returns successfully if there is an agreed upon stream configuration. Example 3 shows how to enable face and hand modules and auto-negotiate the configuration.

C++ Example 3: Enable Two Modalities and Auto-Negotiate Configurations

// Create a SenseManager instance.

PXCSenseManager *sm=PXCSenseManager::CreateInstance();

 

// Enable face & hand tracking.

sm->EnableFace();

sm->EnableHand();

 

// additional face and hand configuration.

...

 

// Init

pxcStatus sts=sm->Init();

if (sts>=PXC_STATUS_NO_ERROR) {

   // two modalities can work together.

} else {

   // conflict in modalities.

}

C# Example 3: Enable Two Modalities and Auto-Negotiate Configurations

// Create a SenseManager instance.

PXCMSenseManager sm=PXCMSenseManager.CreateInstance();

 

// Enable face & hand tracking.

sm.EnableFace();

sm.EnableHand();

 

// additional face and hand configuration.

...

 

// Init

pxcmStatus sts=sm.Init();

if (sts>=pxcmStatus.PXCM_STATUS_NO_ERROR) {

   // two modalities can work together.

} else {

   // conflict in modalities.

}

Java Example 3: Enable Two Modalities and Auto-Negotiate Configurations

// Create a SenseManager instance.

PXCMSenseManager sm=PXCMSenseManager.CreateInstance();

 

// Enable face & hand tracking.

sm.EnableFace();

sm.EnableHand();

 

// additional face and hand configuration.

...

 

// Init

pxcmStatus sts=sm.Init();

if (sts>=pxcmStatus.PXCM_STATUS_NO_ERROR) {

   // two modalities can work together.

} else {

   // conflict in modalities.

}

Device Properties

The camera device exposes many device properties to adjust the behavior of the capturing process. These device properties affect the captured image qualities. Different SDK modalities work best under different sets of device properties, due to the nature of the algorithms. For example, using a deeper smoothing setting smoothies the captured images thus is good for modalities that needs smoother edges and clean images.  Since the local details are lost, such setting is not good for modalities that need to detect and respond to fast local movements.

For an application that works with a single modality, with a bit of trail and error, it is possible to find the setting that works better than others. The following table shows the depth settings for different modalities:

Camera F200

Confidence Threshold

Filter Option

Laser Power

Motion Range Trade off

Accuracy

Factory Default

6

5

16

0

Median

Face Tracking

1

6

16

21

Median

Hand Tracking

1

6

16

0

Median

User Segmentation

0

6

16

21

Coarse

Object Tracking

6

5

16

0

Median

Touchless Controller

1

6

16

0

Median

The table is shown only for reference. The settings may change (or relaxed) over different SDK releases as the modality algorithms improve.

When enabling multiple modalities, you need to carefully trade off what is most important, as it is often the case that the best setting of the modalities differs. You can set the device properties in the following ways:

oHappy Median: Choose the setting that may not be the best for each modality but good enough for the application context. This works when there is no dominant interaction. Each interaction must be accurate (unambiguous).
oDominant Instance: Choose the setting that works best for the dominant interaction. For example, if the application context is to use face tracking to identify a region of interest on the screen and then use hand tracking to manipulate the objects in the region, it is possible to set the setting best for face tracking initially and switch to the setting best for hand tracking when needed.

See also Sharing Device Properties for more details how to manage device properties.

Power and Performance

Consider power and performance as well. Each SDK modality incurs nontrivial workloads, although the SDK continues to optimize for the platform. No data to share as this is highly platform and application context dependent. You must experiment with the actual workloads to understand your application needs on the target platform.

It is recommended to design your application context to time share multiple modalities, that is, to enable different modalities at different time. The user gets to enjoy different interactions thus keeps the user engaged. Furthermore, the user is less likely to get muscle fatigue by doing a single interaction too long. To do so, enable multiple modalities at initialization and then pause/resume the modalities, as illustrated in Example 4. The alternative is to close the SenseManager pipeline and reinitialize for each modality interaction, which works best if the application context switch is long enough to allow camera close and reopen.

C++ Example 4: Pause Face Computation

// pp is a PXCSenseManager instance.

pp->PauseFace(true);

...

pp->PauseFace(false);

C# Example 4: Pause Face Computation

// pp is a PXCMSenseManager instance.

pp.PauseFace(true);

...

pp.PauseFace(false);

Java Example 4: Pause Face Computation

// pp is a PXCMSenseManager instance.

pp.PauseFace(true);

...

pp.PauseFace(false);

Coordinate Systems

Finally, it is important to understand the coordinate system that each modality operates when there is a need to combine different modality data together. See Coordinate Systems for the coordinate system definitions. The following table shows the coordinate systems that the modalities operate on for reference:


Coordinate System Used

Face Tracking (2D)

Use the color image coordinates.

Face Tracking (3D)

Use the camera coordinate system (depth sensor origin).

Hand Tracking

Use the camera coordinate system (depth sensor origin).

Object Tracking

Use the camera coordinate system (color sensor origin).

For example, you may use hand tracking to identify the hand position and use object tracking to locate a virtual object. You must use the PXC[M]Projection interface to map the coordinates of the output data to the same coordinate system, or you would find that the hand and the virtual object are at two distinct places. Example 5 shows how to do color/depth coordinates conversion.

C++ Example 5: Color/Depth Coordinates Conversion

// device is a PXCCapture::Device instance

PXCProjection *projection=device->CreateProjection();

 

// Convert from depth(u,v,z) to color(i,j)

{

   PXCPoint3DF32 depth = {u, v, z};

   PXCPointF32 color;

   projection->MapDepthToColor(1, &depth, &color);

   i = color.x;

   j = color.y;

}

 

// Convert from color(i,j) to depth(u,v)

// sample is a PXCCapture::Sample instance

{

   PXCPointF32 color = {x, y};

   PXCPointF32 depth;

   projection->MapColorToDepth(sample->depth, 1, &color, &depth);

   u = depth.x;

   v = depth.y;

}

 

// Clean up

projection->Release();

C# Example 5: Color/Depth Coordinates Conversion

// device is a PXCMCapture.Device instance

PXCMProjection projection=device.CreateProjection();

 

// Convert from depth(u,v,z) to color(i,j)

{

   PXCMPoint3DF32[] depth=new PXCMPoint3DF32[1];

   depth[0].x=u;

   depth[0].y=v;

   depth[0].z=z;

   PXCMPointF32[] color;

   projection.MapDepthToColor(depth, out color);

   i = color[0].x;

   j = color[0].y;

}

 

// Convert from color(i,j) to depth(u,v)

// sample is a PXCMCapture.Sample instance

{

   PXCMPointF32[] color = new PXCMPointF32[1];

   color[0].x=x;

   color[0].y=y;

   PXCMPointF32[] depth;

   projection.MapColorToDepth(sample.depth, color, out depth);

   u = depth[0].x;

   v = depth[0].y;

}

 

// Clean up

projection.Dispose();

Java Example 5: Color/Depth Coordinates Conversion

// device is a PXCMCapture.Device instance

PXCMProjection projection=device.CreateProjection();

 

// Convert from depth(u,v,z) to color(i,j)

{

   PXCMPoint3DF32[] depth=new PXCMPoint3DF32[1];

   depth[0]=new PXCMPoint3DF32();

   depth[0].x=u;

   depth[0].y=v;

   depth[0].z=z;

   PXCMPointF32[] color=new PXCMPoint3DF32[1];

   PXCMPointF32[0]=new PXCMPoint3DF32();

   projection.MapDepthToColor(depth, color);

   i = color[0].x;

   j = color[0].y;

}

 

// Convert from color(i,j) to depth(u,v)

// sample is a PXCMCapture.Sample instance

{

   PXCMPointF32[] color = new PXCMPointF32[1];

   color[0]=new PXCMPointF32();

   color[0].x=x;

   color[0].y=y;

   PXCMPointF32[] depth=new PXCMPointF32[1];

   PXCMPointF32[0]=new PXCMPointF32();

   projection.MapColorToDepth(sample.depth, color, depth);

   u = depth[0].x;

   v = depth[0].y;

}

 

// Clean up

projection.close();