Tuesday, June 28, 2016

2016-06-28: Autonomy Incubator Intern Deegan Atha Teaches Machines to See

Intern Deegan Atha carefully lines up a ficus and a quadrotor in front of the webcam mounted to one of the widescreen monitors in the flight range.

"You can do it," he mutters to the camera as he gives it a final adjustment before stepping in line with the UAV and potted plant. After half a second, three rectangles pop up on the display, each boxing and labeling the three things in view: Tree, Drone, and Human. He smiles as Kastan and I gasp appreciatively. His algorithm works.

If you've followed the Autonomy Incubator (Ai) for any length of time, you know that we do a lot of research with computer vision and augmented reality. From SVO (Semi-direct Visual Odometry) to PTAM (Parallel Tracking And Mapping) to a host of other applications, we rely on our vehicles being able to navigate within their surroundings. But, what if they could do better than that? What if we had vehicles that could not only detect objects around them, but recognize them— tell a "UAV" apart from a "person" or a "chair," for example? Deegan is back at the Ai this summer, answering these very questions.

A rising senior at Purdue University, Deegan returns to the Ai after starting this project as an intern last fall. He had never done much with object recognition before, but it quickly became his niche.

"When I got here, I didn't do much of [computer vision], but now I do some of this at Purdue," he said. His 3D object recognition algorithm has become so emblematic of his work that PIs Loc Tran and Ben Kelley refer to it by the acronym 3DEEGAN: 3D Efficient Environmental Generic Artificial Neural Network.

"It's a joke we came up with. Are you putting this on the blog?" Ben said.

Object classification isn't just a cool add-on to our existing computer vision algorithms; it has the potential to push our research ahead by lightyears. Why?  If a vehicle knows what an obstacle is— if its a tree versus a person, for example— then it can make a decision about how to maneuver safely in the situation.

Think about it. A tree will definitely remain stationary, so a simple avoidance maneuver is a safe way to go. A human, though, introduces all kinds of possibilities, so the vehicle will have to track the person, decide whether to hover or move, and determine if just landing would be the safest thing to do. The same goes for another UAV, a pole, a car, an airplane, etc. The more obstacles an autonomous vehicle can recognize, the more safely it can operate.

Deegan's 2D and 3D algorithms recognize Anicca, despite her cunning imitation of a robot.

In order to make a vehicle recognize objects, Deegan is training an onboard deep learning algorithm using convolutional neural networks. (And here are some links about those if you're interested.) The way he explains it actually makes a lot of sense, even to people like me, who flunked out of computer science.

Deep learning falls under the larger umbrella of artificial intelligence—it's a subset of machine learning that mimics the way mammalian brains work (hence why they incorporate "neural networks.") Deep learning means that the algorithms involved can learn to recognize patterns without direct supervision from the researcher, instead of the researcher sitting there and selecting features on different pictures for hours in order to train them. He just picks a set of images of something and makes the algorithm analyze them and classify them. Most of the images come from online libraries, but some of them Deegan takes himself.

Kastan explores the 3D algorithm's display with one hand and films with the other.

"The main dataset is called ImageNet, which is a data set from Stanford and Princeton that has millions of images and tens of thousands of image classifications, plus bounding box information," he said. "Bounding box" information highlights where in the image the object to be trained is, allowing for more accurate training.

The convolutional neural network, then, is the method Deegan has chosen to train his deep learning algorithm. It's a way for the algorithm to analyze an image from the data set, pick out features, and then classify it. What makes convolutional neural networks ideal for Deegan's algorithm is their efficiency: they use layers of gridded filters of "neurons" that build upon each other to find individual "features" in the scene—a hand, for example, or a propeller. When every neuron is connected to every other neuron in the scene, the final layer is "fully connected" and the algorithm then takes all the features it sees and combines them into a classification for the object. Here's a related paper from a team at U of Toronto if you want the gritty details.

An illustration of how deep learning algorithms classify images. (source)

"A convolutional neural network is basically just creating a hierarchy of features," he explained. "It would take a lot of memory and computation time if every layer was fully connected."

Plus, because convolutional neural networks scan images locally—piece-by-piece instead of all at once— the deep learning algorithm learns to recognize an object no matter what its position in the scene. So, even if the algorithm had only ever seen pictures of trees directly in the middle of the field of view, it would still be able to find and recognize a tree that was off to the left or upside down.

Basically, instead of trying to run one very computationally expensive, complete analysis of the image, convolutional neural networks run a bunch of smaller analyses that build upon each other to create an interpretation of the image. Make sense? Great! Here's a TEDTalk Deegan recommends to understand neural networks, by Professor Fei-Fei Li of the Stanford Vision Lab.

No comments:

Post a Comment