Thursday, August 30, 2018

2018-08-30: Jim Ecker and his GAN-Filled World

Jim Ecker is a member of the Data Science team at NASA Langley and a part of the ATTRACTOR project, working specifically in the Autonomy Incubator.  Jim received his Bachelor's degree in Computer Science from Florida Southern College and his Master's in Computer Science from Georgia Tech, specializing in machine learning and artificial intelligence.

Coming out of school, he was a software engineer for about eight years and then went to Los Alamos National Lab, where he worked in the supercomputing and intelligence and space research divisions.  From there, he began work at NASA Langley.

Jim has his Master's degree in Computer Science.

HINGE, which was recently outlined here, is partially in support of what Jim is doing.  The main project he is working on right now uses General Adversarial Networks (GAN)a machine learning algorithm that lets a computer generate data from an example data set, such as producing an image from a description.

Using a few different photo databases, including Flickr 30K and COCO (Common Objects in Context), he has over 200,000 real images that have been annotated with five different descriptions of each image.  All of the compiled images include everything you can think of, from cars to boats to animals to people, but overall his main focus was simply people to aid in the search and rescue research that takes place at the Ai.  He went through the COCO dataset and, as best as he could do, pulled out all of the images containing people, and is currently curating data from Flickr30K.

The more images you have with captions, the more data is given to the neural network, and it will eventually begin to better understand what the pieces of an image are.  For example, the neural network, after receiving many different images with a caption containing the word "glasses," will eventually learn exactly what the word means and be able to detect what glasses look like based on the pixels and similarities between the different images.  If you write code to create, say, an image of 'a woman in a red cocktail dress,' the neural network will take what it knows from the other images it has seen and their descriptions to draw that.  "As it looks at more and more examples and learns more and more how to make that kind of thing, it starts to learn what different things are and make a better drawing," he explained.  "It's very weird what it's doing when you first think about it; it starts with what looks completely random, and it eventually learns how to draw the pieces."

A woman in a cocktail dress.

The cocktail dress example is actually quite mind blowing to us.  It connected 'cocktail' to a cocktail bar, and since bars typically have mirrors, it was able to mirror the woman's back in the top right of the image.

The descriptions of the images and how in-depth they go is very important.  You can have a caption that simply says "a woman with an umbrella," but how much does that really tell you? Not much.  A better description would include both what they're wearing and what they're doing as well, like "a woman in a black jacket, a blue and white checkered shirt, white pants, and black shoes is carrying an umbrella while walking through the rain."  This includes more details, so the neural network is able to learn and draw more with the information.  "I need the descriptions to be as specific as possible," Jim said.

Following his explanation, he showed me a demo, where he described me and what I was wearing to see how it would come out.  With the description being "a woman wearing a black and white striped shirt with a black sweater," you can see what I looked like below!


A renaissance painting entitled: Payton Heyman

I'd say it looks just like me! Especially with the wind-swept ponytail.

Along with the image outcome, it also has handy visualizations of "how each word in the description is mapped to different parts of the generated image." as Jim said.  "It gives each of these words some weight, saying here's the woman, here's the sweater, and so on.  These visualizations are key to providing explainability to an agent's environmental understanding."


The annotation tags highlight each characteristic.

He explained how the hope is that it will get much better in being able to accurately visualize what the described person looks like.  It stores all of the information similar to how the human brain stores visual information, based on the Dual-Coding Theory.  This is how it all ties into the search and rescue research.  "It encodes the features basically into memory storage in your mind.  The idea is to kind of try to replicate this so that when an object detector [like a drone] is looking around with a camera, every time it sees a person it can store the represented information and compare it to what it already knows."  When it finds someone who looks like who they are looking for, it would realize, "oh, that's who they were talking about."  The entire idea is that once Jim trains the object detector enough, it would hopefully be able to successfully recognize someone, but in order for it to be that specific you'd have to have thousands of pictures of that person and train it for quite a long time.  "Using synthesized visualizations from a GAN does this in an unsupervised manner, requiring much less data."

We love our monitors at the Autonomy Incubator.

"Other than all of this, I am also working on deep reinforcement learning," he said.  An example of this is how he is teaching an agent to play Super Mario BrosTM. Basically how this works is the agent looks at all of the pixels on the screen in order to decide what to do.  As it plays the game, it learns more and more of what actions do what and what to do in a situation.  Jim is able to pull up the source that shows what actions are going on at any given time during the game.  "It's kind of like a hexadecimal representation in and of the buttons; some of them might even be combinations of a NintendoTM controller."

As mentioned previously, the HINGE project is in support of his research.  It has been able to give him and the rest of the team an idea of what type of data they need to give the GAN in order to get the kind of data they need.  They're able to see how to best talk to it and see what the best descriptions are to help it best visualize.  Jim's work is improving by the day and we look forward to seeing how it progresses even more!

No comments:

Post a Comment