Driveable IP-Camera Robot with Onboard Intelligence

A driveable IP-camera robot with built-in artificial intelligence powered by Windows® 10 IoT Core and Intel® Movidius™ technology to recognize and differentiate between objects in view.

Orbii Image streaming live video to tablet
Orbii image streaming live video to tablet

The Orbii* IP-camera robot is the world's first modular, artificially intelligent, driveable home-monitoring device that can stream live video, audio, and sensor data to any smartphone, tablet, or computer. It has a high-definition camera, a microphone, a speaker, on-board storage, wireless charging, and an array of environmental sensors that enable the user to monitor a home or office from anywhere in the world.

It is the most innovative smart home-monitoring solution. A user can control the Orbii ball from anywhere in the world, including driving it from room to room. With its onboard object-recognition system (with the help of the Intel® Movidius™ Neural Compute Stick), it can distinguish general movements from something that needs attention, and it can send the user alerts accordingly.

Methodology/Approach

Traditional security cameras are not intelligent, and they have limited visual coverage. The Orbii camera, powered by Intel Movidius technology, is smart—that is, artificially intelligent—and driveable, providing more visual coverage than any other security camera on the market. It can recognize and differentiate between objects and movements, and it alerts the user accordingly.

Things we used in the first prototype

Project Bill of Materials (BOM)
Raspberry Pi* 3 device, with Windows® 10 IoT Core operating system
Sensors and serial motor driver
2 geared DC motors (6.0 volts)
USB webcam
Intel® Movidius™ Neural Compute Stick
Pretrained image classifiers:
  • SqueezeNet model, which includes about 1000 items pretrained
  • Extract model: ONNX file
Sample UWP app source code that includes the SqueezeNet model in the Assets folder
Microsoft Visual Studio* 2017 software
Intel® RealSense™ SDK for Windows* - build 17110 or above

How we built the prototype

These are the steps we took:

  1. Install and set up Windows 10 IoT Core OS on the Raspberry Pi 3 device—including connecting to Wi-Fi—as described in these instructions

    Note: The Raspberry Pi 3 B+ model is currently not fully supported on Windows 10 IoT Core.

  2. Create a Windows 10 IoT app in Visual Studio 2017.
    1. Download the Windows* ML (machine language) platform Explorer sample app to start with, and then write custom code in the ViewModel to load USB camera stream as video frames.
    2. Using the interface's generated wrapper classes, follow the load, bind, and evaluate pattern to integrate SqueezeNet* ML model into the app. Details and some sample code here.
    3. Show live video stream on the main page; on right side of the page labels of recognized objects. The code is able to run inference using CPU, GPU, or VPU (Movidius Neural Compute Stick) as Windows 10 IoT enables drivers for the Movidius Neural Compute Stick. (In the next release of Windows 10 IoT, the code will, by default, use the connected VPU, and that will increase the object-recognition speed significantly, making the CPU available to do other tasks more efficiently.)
    4. Add virtual joystick control and DC motor driver code in the source code. (That part of the code is not in the above sample WinMLExplorer project.) Below is some C# code to give you an idea. We are using serial motor driver from Pololu that is capable of taking serial commands from microcontroller over UART serial bus and translate those commands to drive two DC motors. Using inputs from the joystick control on screen, send command packets that include motor direction and speed values to the serial port and motors drive accordingly, which drives the whole robot forward, backward, right, and left.

Here is the C# code to drive two motors with the serial port:

dataWrite.WriteBytes(commandByteArray); 
//SerialPort is of type SerialDevice that handles writing ByteArray to Serial port on hardware 
uint bytesWritten = await SerialPort.OutputStream.WriteAsync(dataWrite.DetachBuffer());

Full project sample code we wrote can be downloaded here.

Streaming live video and recognizing objects in view
Sample App screen shot, streaming live video and recognizing objects in view

We also connected IMU (gyro, accelerometer, and magnetometer) to the Raspberry PI device using I2C pins so we could calculate heading angle, pitch/yaw, and adjust motor speeds to correct heading angle while driving. Eventually, with the inputs from the camera and IMU sensor, we will be able to drive autonomously in the indoor environment. We may also use VSLAM (visual simultaneous localization and mapping) from third-party SDKs.

Future Work

We are also working on an onboard API that will be capable of streaming live video and sensor data to a remote client app, and a client app running on phones or tables will be able to send drive commands back to the camera robot and navigate to a specific room while streaming video, audio, and sensor data. We are already working on a new PCB for our next prototype and 3D-printed housing and shells for the robot. The video of the prototype is included with this article.

Since Windows 10 IoT supports a remote desktop connection, in the current prototype we are using RDP client to connect to the app running onboard, so we can see a live video feed and drive with a joystick on screen.

Technologies Used

  • Intel Movidius Neural Compute Stick
  • Windows 10 IoT Core operating system
  • UWP apps
  • Windows* ML (machine learning) platform
  • Visual Studio 2017 software
  • Raspberry Pi 3 device
  • USB camera
  • Pololu* serial motor driver
  • 7.4-volt lithium battery
  • 5.0-volt voltage regulator to step down battery voltage to 5.0 volts
  • 3D printing for some mechanical parts

Conclusion

The Intel Movidius Neural Compute Stick enabled this Smart Drive-able Camera ball to run inference on device and send alerts to user based on what camera sees in view, thus reducing the lag of transmitting video frames to cloud to run inference on server and then get the results to the user, if it recognizes a real situation. While traditional security cameras send alerts to user based on any movement they see on in the view.
 
assembly top
Prototype 1. Assembly top

 

assembly bottom
Prototype 1. Assembly bottom

 

electronics P C B assembly
Electronics PCB Assembly Breakup of Orbii Final Prototype

 

Breakup of Orbii Final Prototype
 

 

For more complete information about compiler optimizations, see our Optimization Notice.