Stay on top of the newest IoT trends. This month: Robotic arms in IoT.
Visit https://software.intel.com/en-us/blogs/2017/07/24/the-selfie-robot-an-integrated-cloud-powered-framework for technical details about this IoT Robotics project.
Hi, guys. Thanks for joining us, again, for our very first Intel IoT Developer Show, live on Facebook. In this month series, we're going to be talking about what's new in IoT. We're going to be showing you some cool demos, we're going to be giving you some resources to keeping an awesome IoT developer using Intel Technologies.
My name is Martin Kronberg, and I am an IoT developer evangelists here at Intel, and these folks up here with me are our own Intel IoT interns, Isabel Zhang and Milly Xun, who have come up with this really awesome IoT robotic arm demo. Milly and Isabel, do you guys want to talk a little about yourselves before we dive into the demo?
Yes. Thank you for having me today. I'm Milly, a senior at Washington State University, studying computer science, and I specialize in working with microcontrollers, and love making cool things. This internship has given me a wonderful opportunity to work with IoT, and I'm excited to talk to you about it
Hi. I'm Isabel Zhang. I'm currently studying computer science at UC Berkeley and I'll be graduating this upcoming May. I've worked extensively with virtual reality and 3D animation in the past, but I've always wanted to have a chance to work with IoT. And this internship has giving me the opportunity to do so, and I'm really excited.
Yeah, great. You know, it's been really great having you guys here, and I'm super excited to talk about this project. So let's go on and get into it. Tell me about what you guys have created here.
Yeah, for sure. So this is a robotic arm with a mounted webcam. So it will detect and track faces. It finds your face and it tries to keep that centered in the frame. If you move left or right, or up or down, it will move accordingly, and, also, there is smile detection.
So as soon as you smile, it will take a picture of you and upload that to Amazon Rekognition, which is a deep learning, image processing service that Amazon has. It tags pictures and it finds out, all right, there's humans in the picture, maybe there's an LCD screen. Are you male or female? That kind of thing. We fetch that data and then we showcase it on our website.
And the purpose of this project is to showcase Intel Technology into a portable demo to show at trade shows. So then, when we present it, we'll just be like, check out this cool demo. And you know what's even better about the cool demo? It's open source, so here's all the tools you need to make this. And also, we wanted to compare the processing speed between local and cloud processing in order to check the links and see the difference.
Sure. Yeah. I mean, I really love seeing these kind of physical demos at trade shows. It, kind of, really brings me over and I've always been interested to learn how they work, and like what they're doing, and just kind of learn more about it. And I guess on that note, let's talk a bit more about the specifics of the technology that you guys used in order to get this project to work.
Yeah, for sure. So we have the Trossen Los PhantomX reactor arm and the webcam is any standard webcam-- we're using the Logitech C270. And both of these are plugged into the Intel i7 NUC, which acts as a gateway and it does a lot of the processing, computation, that kind of thing.
And then for the software, we're using ROS, which is a Robotic Operating System, and this makes it easier for robots to send uniform communication messages throughout all of the process that it's running. And we're also using OpenCV to do the computer vision. And OpenCV is an open source computer vision library that's commonly used by companies and universities everywhere.
And then for the website, we're using Mainstack, which is MongoDB, Express.js, AngularJS, and Node.js. So MongoDB is the database and then Express.js is the REST API, which creates, receives, updates, and deletes it through HTTP protocols for the POST, PUT, and GET, for example. And then for the cloud services, we are using Amazon Web Services, where we use Rekognition to process out facial data.
OK. So then, just to recap, we basically have this robot arm, it's being controlled by an Intel NUC running in I7?
OK. And then it is using ROS, which is the Robot Operating System, in order to do the robot control system, as well as create a unified communication protocol between all the different parts.
Then it's also using the OpenCV for the computer vision, locally. And it's also running a server, locally, which kind of provides that frontend, which was built using Mainstack.
And then, on top of that, we're also sending images up to AWS Rekognition, which is that deep learning, image processing service. Right?
Wow. That sounds like a lot of different parts have to work together. Did you guys face any challenges trying to get all these pieces to actually function?
Yeah, for sure. So for the majority of the project, we worked separately. I worked on the robotics and computer vision portion and she worked on the cloud services and the website. And so when it came time to make sure everything works together for our demos, we had a lot of issues trying to just get the languages to work together, making sure that the messages we sent were uniform, as well as-- I know we had an image and coding issue that was completely just random.
Yeah. Replacing spaces with plus signs was weird, but we figured it out.
And also, as you can see, there's only one robotic arm right here. So when we came to integrate, we couldn't just be like, OK. I'm going to work on mine, and you're going to work on your setup, and then I'll push it up and I know it works. No. It was like we went straight to one set up and said, OK. We are going to do this line by line, make sure it works.
Paired programming. Share the keyboard. And then afterwards, it's like, OK. Who's going to get this commit? Put your name on this one and I'll put my name on this one.
That sounds like a lot of fun, actually.
You guys should have gotten just, like, two keyboards and done, like, real hardcore coding.
Yeah, it's like every time we tried to spell something and we'd both press the same thing and so it's like d-d-e-e. What?
So, yeah. Have you guys had a lot of experience with robotics or with web development in the past?
No. I came into this completely-- I told her the first day. I was like, you know, I've never touched hardware before so working on robots, it's going to be really exciting and we'll see how it goes.
And then, I've never worked on web development either but I thought it would be interesting to start, so I tried it.
Wow. It's really impressive that you guys were able to pick all this stuff up in just a couple of months. Well, let's actually check out how this guy works.
Yeah, for sure. OK. So once we launch the program, you can see on the screen, on the left, is an OpenCV window. And this shows the boundaries and it will do some basic facial detection. You can see as it detects a face, it will find what the center is, and that's the red dot, and the robot arm will move accordingly.
So I noticed that it works a little bit better if I have my glasses off.
Yeah. So OpenCV is, right now for the face detection, it's looking for the center of the face. It's looking between the eyebrows and the eye and so if you have glasses frames there, it can often interfere with the facial detection process. And then, on the right hand side we have our website. You can see that whenever you smile, it will take a picture, and it will show up on the screen, here.
It's quite generous on the smiles.
It is a little generous on the smiles.
We also have a marker control. And what this does is it detects the color orange. You can see orange in our setting up here.
And here's this orange Rubik's Cube.
Yeah. So what it's basically doing is it's looking for the largest orange object and trying to use that as a face, trying to get that into the center. And then we also have a manual control option where we're able to physically move the robot based off of the buttons. So if you have people of different heights, it works really well for that.
So using this kind of web interface, someone could, potentially, interface with this robot from anywhere. Right? You could, kind of, log into the website.
So as soon as you take a picture-- so we had a couple of pictures taken. It's uploading that to Amazon Rekognition, right now.
And this is the web page that Milly created, so she can talk more about it.
Yes. So after we take the image, we send that up to Express, and then from there we have the POST request send, up to Rekognition, the image and it'll send back the metadata saying like, person, human, long hair, short hair, things like that. And then we save that data into a face structure to save to the database, MongoDB, as I mentioned earlier.
And so from there-- right now, what this page is doing is it's dynamic querying data from the database to get all of the images based on the time stamp. And so on the very top you'll have the newest image and then on the very bottom you'll have the oldest image saved in the database.
So for each face structure there's the image, metadata, and the time stamp so it makes it very easy to identify.
And then in order to actually interface with AWS Rekognition, you're using their SDK. Correct? OK.
So for that SDK, I'm using the AWS SDK and Node.js, where I actually had to have an extra file for security patches in order for it to recognize that, yes, my computer is a valid place to contact my "thing" up in the cloud. So I had to create a thing. I named it "Milly"-- after me-- [? -NUC. ?] And so, from there, we'd be able to track to see if it's sending responses, as well as create a Lambda skill so that any time it sends data, it processes straight into the-- updates the shadow and tells us whether we're getting the data or not.
Yeah. So this is internet-based, so if you have slower internet speeds, it takes a while to query all the-- first of all, it has to query all the images in our database and then it's also doing image processing, which is why slower internet connections lead to slower updates of the website.
Yes. As you can see, the dropdown list is still trying to query all the data.
I guess while we're waiting for that to finish uploading-- so AWS Rekognition is a deep learning service. Right? And it's based around some other AWS services. Right?
Yes. Yes. So Rekognition is created by Amazon. It is deep learning service that they created for Prime Photos, which is another service that Amazon has, to uniquely identify collections of photos to personalize to each user. So we know Amazon has lots of users and when they say "photos," how are they able to correctly ID-- like, store the images up in the cloud and make sure it doesn't get confused with somebody else. So they're using Rekognition to compare faces, get facial data, and then using this data, they'll try to say, OK. So this person generally has these faces in it, so that it's to this person and then these people have these faces, and these are their friends, and such.
Right. Yeah. And I guess you could also use that service to search your photos for specific objects, or specific scenes, for instance. So, you know, if you're looking for, oh, I want to find a classroom scene for my photos, you could parse it out. And it's not like you have to tag it, specifically. It'll pull it out from its deep learning network.
So we can show a screenshot of what this website will look like-- if it comes up, we'll take a look as well-- but, basically, it will display your image on the left and you can see us at our office and it will show the metadata on the right. And so you can see, here, that it says "human," "people," and there's a white board in our background and so you can see that it displayed a "white board" option, as well. And it's also able to identify, you know, black hair, and that kind of thing with a fair degree of confidence. I think Milly said it was up to 80%.
Yes. So I have this display up from 50% confident and above that it will display all the metadata, but we can safely assume that anything 80% and above is considered accurate, in terms of human tagging.
OK. Well, I mean, fair enough. Live demo. I guess sometimes the internet isn't always going cooperate, but I guess that's also one of the things that we discussed earlier is that if you're doing local processing, it's going to be a lot less latency and not based around the network. But if you're using a cloud service, you can encounter latency issues if you have slow internet or if you start dropping packets for some reason.
Fair enough. I mean, either way, this is a really, really cool project and I have a couple of more questions. So since you guys mentioned that you have never really worked with any of these technologies before, I was wondering if you had any sort of tips or things to avoid for developers who want to learn about ROS, or OpenCV, or interfacing with the cloud.
Definitely. So setup is definitely half the battle. For example, Isabel had to individually ID-- like take apart the entire robot and individually ID every single motor in order for it to work. And before, it just would, like, fly off the table and we're like, whoa! And for the website, I had no idea how to set it up so I was just like, OK. Uhh, Google! And Google is like, here, we have the Yo Generator for you. All right. Lets use it.
And then we also learned that its totally OK to dive down into one path, five hours later find out, you know, maybe this isn't the right way. I know at the beginning, we both had to try different distributions of Linux, we had to try different distributions of ROS just to try and figure out what's the most compatible with what we want to do for this project. And then that happened a bunch of other times for different things, as well.
At least five times each time, each person.
We also learned that it's really helpful to utilize other libraries that's already out there. People have been doing a lot of great work and they made it open source and so this way we don't have to go ahead, like, reinvent the wheel and that kind of thing.
Exactly. OK. So, basically, you spent a lot of time at the start. You kind of front-loaded with getting everything set up, getting all of the documentation compiled, and then that allowed you to move forward and kind of start combining all of these elements and actually developing on different elements. And then also, like you said, if you're moving down a path that's just not working out, don't be afraid to just say, well, I guess I-- you know, it's not like you waste time, but it's that you learn the incorrect way of doing something.
So. Fair enough. If you were to go back and redo it, would you guys make any changes or do something differently?
Yes, definitely. So for Mainstack, rather than using the Angular as the frontend in that stack, one of my friends, who's a Full Stack developer, recommended I use React because it's easier to learn, it's used by Facebook, and the industry uses it pretty commonly. So I could have tried to use that instead of Angular since there was a higher learning curve with the syntax and all that.
And then the robot has five degrees of freedom. Right now, we're only using two of those. We're using the shoulder joint, for the left-right movement, and the elbow joint, for the up-down movement. It also has a gripper and it would be really awesome to kind of use that, as well, because it adds more functionality and we can add more features.
OK. Yeah. I think it would be really great to have the robot be able to pick up an object, for instance, and hand it to someone interacting with it at a trade show booth demo or something like that. And I guess more on that point-- what kind of applications do you guys see a system like this being used for?
So we are actually in the process of implementing a security feature. And so this is kind of a user authorization. So it takes in a database of known faces and those are the users that are allowed access to the website. And then the camera will take a picture of the person trying to access the website and if they are not recognized, then it will lock them out, or sound an alarm, that kind of thing.
Yeah. So it's kind of like those security movies where, like, high security clearance. All right. We got to track them by their face. Do we recognize them?
Yes. And then some other applications are for employers that have, like, stores and whatnot. They could use this for getting the customer demographics, as well as figuring out what they're wearing. Like, is there a trend to what they're wearing? Say if it's a clothing store, they all like this robots favorite color, orange, and so they need to stock more orange clothes.
Or like they track what's in the shopping cart, just in order to figure out what they need to keep more in stock of. Also for smart home applications. You can have a robotic cameraman. So if you're cooking some sort of recipe and then you're in the kitchen wandering around, doing all these things, you can't exactly have a still camera to do that. Or we can just do it for this IoT show. Right?
Sure. Maybe season 2 will be shot entirely with our newly built robotic camera.
And then we also have other smart home applications-- would be a smart fan. So you go to the gym, you work out and it's really hot, and you kind of have to walk around to wherever the fan happens to be pointing. So with our robotic arm, that would detect faces and maybe have infrared technology and figure out, you know, you're the warmest person in the room so we should blow more air at you.
There you go.
That kind of thing. And it will help save on electricity and increase efficiency, as well.
By turning off when there's no people around.
Cool. Yeah. Well, it's sounds like there's really a lot of things you can do with all these technologies and build upon the framework to kind of expand the capabilities. So do you guys have any plans for future development on this platform, specifically?
Yeah. Right now we're using OpenCV but Intel has a specialized version of OpenCV, called CV SDK, and so we're thinking of using that to help, kind of, speed up the image processing as well because CV SDK is optimized for Intel technology and Intel hardware that we're using.
Then, also, we wanted to add deep learning, locally, onto the gateway in order to, as I mentioned earlier, compare local processing versus cloud processing. So right now we're having Amazon take care of all the deep learning, but we also wanted to compare the speeds of if we did it locally. But for time's sake, we used the cloud, like Amazon, just because they already had a model already built, they had it trained.
I mean, because we'd have to gather our own data, and then we would also have to train that data, and it would be a much longer process. So it would be faster, it would be more precise, but it would require more compute power, and it would just take longer to get this set up.
Plus side is, though, don't have to depend on the internet connection for it to work.
That's true. It still looks like it's trying to load up. It's--
OK. And then, also, we have more information on this posted online. Right?
Yeah. So we wrote up a blog post, kind of detailing the hardware that we're using, the software that we're using, with a couple of links to those, as well. We're going to keep on updating that as we go on, document the step by step procedure on how we built this robot. You can find that in the links below to this video.
We will eventually be posting the code.
Great. OK, great. Well, thanks a lot for sharing this. This was really, really interesting. And before we sign off, I want to tell you guys about a couple more webinars that are coming up. So first of all, we have Developing for Visual Retail Using Intel Active Nanotechnology, which is happening in three days, on the 27th of July. And it's going to cover how Intel Active Technology, which is a component of Intel vPro, enables remote manageability for systems.
We're all going to be hosting an IoT journey from prototype to startup where Intel innovator, Paul Langdon, is going to share his experiences in the process of implementing a minimum viable product and then refining that prototype into a full commercial IT solution. He's going to review the strategies, the pitfalls, and discuss the tools that he used in his IoT journey.
Well, thanks again for watching. My name is Martin Kronberg.
I'm Milly Xun.
And I'm Isabel Zhang.
And make sure to tune in next month for more IoT goodness, and don't forget to like this page on Facebook.