Click here for a transcript of the AI for Disaster Response video.
DR. ROBERT MUNRO: Thank you. It's great to be here this afternoon and to talk about a topic which is really dear to my heart. How can we make sure that AI can help us respond to people when they're most in need? I'm Robert Munro. I'm the CTO of a company called Figure Eight. Until recently we were known as Crowd Flower. You might be familiar with that as the company name. Recently rebranded. We make the most widely used software for annotated training data and build complete AI systems that go from raw data to deployed models. So if your car parks itself, your music is based on recommendations, your fruit is scanned between farm and table, you're probably using AI that we ship.
But today I'm going to talk about one particular application, AI for disaster response because this was actually my path. My path in artificial intelligence was not a typical one. Out of undergraduate while I'd studied AI, I didn't really expect to have a career in it because there really weren't careers back at that time.
And so I went and worked as an engineer globally, I had this one experience in the mid-2000s. So I was working for the United Nations High Commission for Refugees. I was working in refugee camps in Liberia and in Sierra Leone where I was living at the time, and we were installing solar power systems at schools and clinics supporting refugee camps so that these schools and clinics could be more self-sufficient.
And it was while I was there at one of these very remote clinics in Liberia that I was standing there. We were already kind of late getting in, so I had to build this system, we had just two days, we had to move on to the next place. And someone came up to me in this village and they're like hey, we think some new refugees just came over the border from Cote d'Ivoire into the neighboring valley, but we don't know much about them. And of course, they came up to me because we were the ones there working for the UN Refugee Commission and installing solar power.
But frustratingly, we couldn't find out anything about them. It was just one valley over. We didn't know if there were 10 or 10,000 refugees there. And what was especially frustrating was that I had five bars of cell phone reception. I had perfect cell phone reception and no doubt some of the refugees did as well, and they were probably bouncing their cell phone signals off that same tower that I was.
But even if I could connect with them, they would have spoken one of a half a dozen local languages for which there certainly weren't machine translation systems and other AI that we took for granted even 15 years ago, like search engines and spam filtering, it wouldn't have worked in any of those languages as well. So even if we could connect, there wasn't much we could do.
Ultimately in this use case, we just had to report back to the capital Monrovia that maybe there are refugees there. We don't know how many. We weren't able to find out. So that really motivated me, because I thought, well, you know, I'm not really needed to be on a roof drilling in solar panels, a lot of people can do that. Someone employed locally can do that. But I had studied AI, so I thought, well, how can I help AI be adapted to more languages? And so that's what motivated me at that time to move to Silicon Valley, where I got a Ph.D. focused on natural language processing for health and disaster response at Stanford looking at how we can adapt to low resource languages in this context.
And while I did that I continued to work in disaster response. And so the first time we were able to deploy AI in a disaster response was in 2010. I'm sure a lot of you remember in 2010 a very large earthquake hit Haiti in January of that year. More than 100,000 were people killed immediately, and more than a million people were left homeless, and so working with a number of people in the international response community, we were able to set up a phone number, like a 911, or I guess I should say, I forget, triple one in the UK? The equivalent of 911? 999. All right. Everyone who's visiting, take note in case of emergency later.
So we were able to set up the equivalent of a 999 service in Haiti because most of their cell phone towers were still working. Calls weren't getting through, but text messages were. So on broadcast radio, we could advertise that people could call the service to report their needs or resources if they had any, and we could link them with international responders. But we had this problem in that the majority of text messages were sent in Haitian Creole, the one language that most people spoke there, and the majority of the international response community only spoke English as their common language.
So messages like you could see on the left, which were really important. So a hospital running out of supplies, someone undergoing childbirth, someone needing search and rescue, couldn't be understood by the people coming into the country. So it fell on me to find and manage people who spoke both English and Haitian Creole to do real-time translation, categorization, and mapping of these messages so that a plain text message sent in Haitian Creole could be an English report categorized with the exact longitude and latitude for the international response community.
So in 48 hours I was able to define and ultimately manage about 2,000 members of the Haitian diaspora who were able to join us from 49 countries worldwide and in real time be able to complete this translation and use their local knowledge in a way that just simply wouldn't be possible if you weren't from that region. We're also integrating with AI systems here. So taking a message in Haitian Creole in its English translation, we're able to feed that parallel data to machine translation engines at Microsoft and at Google, and they quickly released the first ever machine translation systems for Haitian Creole and English within a week of the disaster, based in part off the data that we were able to give to them for emergency messages. And as many of you who work in machine learning will know, the fact that these were somewhat messy social messages and they were about the topics of health and disaster response means that the machine translation systems were then also better in messages related to health and disaster response.
One of the big takeaways here was the importance of engaging the local community in this process. So just imagine that we'd already solved the problem of machine translation and you had a message that looked like this. Sacred Heart Hospital in the village of Okap is ready to receive patients. So looking at this map here, who can see Okap on this map? No one? I'm going to zoom in a little. It's up here. Anyone? Okap? No? I'm going to zoom in once more. See it now? No, right? It's difficult to see. So Okap is actually slang for Cap-Haitien. And once you're told that, it kind of makes sense. So Cap-Haitien with a C-A-P in the language becomes K-A-P, the same way a hard C and a K might alternate in English and German. O is a marker for a location, just by coincidence, as it can begin in Gaelic. So these follow linguistically consistent rules, but if you don't know what the slang word is, you could spend a long time looking for this. And Cap-Haitien is the second largest city after Port au Prince in Haiti. This isn't a small city. So your knowledge of the existence of the city might not be enough.
So working with these Haitians worldwide, they were able to use their linguistic knowledge, but also their local knowledge. Because if you're from there, you would actually know that Sacred Heart Hospital was about 15 kilometers south of the city itself in a smaller town called my Maillot. So again, something very important if you're evacuating in this case by helicopter people to a given location.
So to share one example of people collaborating to find this out. So here we have two people collaborating, Delilah in San Francisco, Apo in Montreal to try and geo-locate a message. So, on this online chat with people coordinating, Delilah saying I need Thomassin, Apo, please. So where is Thomassin, and Apo immediately replies, here is the longitude and latitude. It's in this area after Petion-Ville, Google Maps isn't there. And if you look at Google Maps at the time, you can see that yes, like this is just a bend in the road, none of the suburbs or roads are labeled there.
But Apo grew up there, so he can drop a pin exactly which generates a longitude and latitude, which means that the responders can go out and address this issue. In this case, it was a breach birth. It was nice. This was probably going to be a troubled birth regardless. And so for a small period of time following this disaster, they had some of the best physicians in the world able to respond to this particular medical emergency.
I think It's very interesting that you know because I knew this place like my pocket, I know this place like the back of my hand, and Delilah says well, thank God you're here. And it's interesting to think about where here is. I mean, did hear mean the online chat room? Is it San Francisco? Is it Montreal? Or is it with Haiti? It shows how people can collaborate globally in order to work together to solve some of our biggest problems.
So that was my path. Disaster response, Stanford University. I founded a few companies in the AI space in San Francisco. Immediately before joining Figure Eight, I was running product for a natural language processing and translation at AWS, helping convince them to be multilingual in their first ever products, which I think itself helps a lot of people. And the reason this is important is that I wonder what, you know, what's your intuition? So how much of the world's conversations daily is English? On a given day, how many of the world's conversations are in the English language?
So the answer is 5%. Just 5% of the world's conversations are in English, and that's fairly consistent. But about 95% of AI only works in English or only works well in English. And if you speak a minority language, you're disproportionately more likely to be the victim of a man-made or a natural disaster. And also education between men and women will favor men for dominant languages. You get the same divisions across ethnicity.
In fact, race in parts of the Amazon is determined by your language more than your actual ethnicity. So this linguistic bias is also a gender and a racial bias that we have in our AI today. I think this is one of the biggest problems that we're facing, is what AI technologies are available for everybody in the world.
One really interesting and also linguistic use case that I've worked on, again, before I joined the company, but using Figure Eight's technology, is epidemic tracking. So this is the famous map from Jon Snow. Like, not that Jon Snow, but the 1800s Jon Snow, who discovered a cholera outbreak using geographic information mapping just down the road here in London.
And so disease outbreaks are still the largest killer in the world, and no organization is tracking them all. You might have seen movies where people have great big screens and it's a heat map and it flares up every time there is an outbreak. That doesn't exist anywhere. And the budgets for those movies probably exceed the budgets of any one organization actually tracking disease outbreaks globally.
And this is pretty scary, because in the last 75 years, we've only eliminated one human disease, smallpox, and the amount of air travel has increased greatly. And we definitely put a lot of resources into stopping terrorists getting onto flights, but a pathogen is more likely to sneak onto a flight undetected. It's certainly been responsible for many more of the world's deaths. And the reason that this is a linguistic problem is that 90% of the world's pathogens come from this thin band of the tropics. This thin band of the tropics has 90% the world's ecological diversity, including things that can kill us.
By maybe coincidence, maybe not, the same thin band of the tropics has 90% of the world's linguistic diversity. So what that means is that the first time that somebody notices an outbreak, they're speaking about it in one of 6,000 different languages, and chances are that language is not English or Spanish or Mandarin. It's not a dominant language. We can actually go back in time and find reports of disease outbreaks weeks, months, sometimes decades before they're finally put in front of virologists and identified as being a new pathogen that we need to track. Every single transmission is a possibility that these could mutate to become more fatal, and so we want to get ahead of any outbreak as soon as possible.
So in the case of swine flu, we can find cables coming out of southern China weeks ahead of when this was identified as H1N5, as a new strain of the flu that hadn't been identified before. In the case of bird flu, we can find local newspaper reports in Mexico months before it was identified as a new strain of the flu, with telltale symptoms, like all the young people are sick in the village at the moment. So if you're a virologist or an epidemiologist, you're like, oh, right, this is obviously like a new strain of the flu. But in this paper, it was just remarked upon and missed. In the case of something like HIV, we can go back decades to find this kind of information.
So simply finding these reports as early as possible can help prevent epidemics. So epidemicIQ is an initiative that was trying to take millions of reports worldwide and find out which of these are relevant to disease outbreaks so that they're 15 to 20 per day, which really are new disease outbreaks that we care about can be put in front of the right epidemiologist and virologist of just for review. So those of you who speak Russian, Arabic, or what's the third language up there, Mandarin, you'll see that these are disease outbreaks categorized by the type of disease, the location that it's in, the number of people infected.
So we're able to use machine learning to filter that anything that might look like a disease, put this in front of crowdsourced workers, micro taskers for review, have them correct or reject the given machine learning analysis. Finally, put that in front of the domain experts for review. With all of that information from the analysts going back to the machine learning models, so it continually updated, adapting in about 15 different languages, including some here in Europe.
So in 2011, there was no outbreak of E. coli in Germany that had a number of fatalities. And we're able to show using this kind of online tracking of newspaper articles and social media in German, that we can get ahead of the European CDC in identifying the outbreaks and where they occurred.
It's something I've used it in smaller languages as well. So in partnership with UNICEF, using the same human in the loop AI process to adapt to a number of local languages in Nigeria. In this case, it was called the First 1,000 Days program. So from when a woman learns that she is pregnant, for the following 1,000 days through birth and beyond, tracking things like the number of vaccinations, the ongoing weight and changing weight of the child in order to help with maternal care. Again, having language independent AI that local analysts within Nigeria who spoke those languages could encode and then adapt to their given use cases.
And then we're just starting to see more use cases in computer vision in addition to natural language processing. So the company I was at at the time, we hosted aerial imagery analysis following Hurricane Sandy to identify what was not just a marshland but an actually flooded region, which FEMA used to help decide where to deploy their resources. Right through to some interesting use cases on our product today where people are using computer vision in sub-Saharan Africa to track elephants and people who are near elephants in order to identify areas where there might be poachers encroaching on the herds.
Something we're really proud to announce just a few weeks ago now is that we've made eight new open datasets available on our platform, all of which tackle either a social good problem or a particularly hard problem in machine learning. And two of them are particularly related to disaster response. So this is a map of some of the different dedicated workforces that we have on our platform, and as an extension, they speak a number of different languages.
So in this case, this is speakers of Swahili helping is in partnership with the Red Cross create Swahili recordings of a number of health and disaster response related messages in the Swahili language translations to English so that online translation systems, speech recognition systems, can become better across all these languages. So that's a Swahili translation speech and transcription.
And then happy to announce that as of tomorrow we'll have a new dataset available which is a collection of text messages, including the ones in Haiti, across a number of disasters, encoded for a number of common disaster response topics. And again, it's an open data set, so we're really hoping that anyone in the machine learning community can experiment on this data set, tell us trends about what people communicate in disasters that we don't already know, and then also come up with machine learning solutions that can help us automate or semi-automate the disaster response process going forward.
Thank you all for your time. I think I burned all my question time right now. I'm getting a nod back there. But I will be available at the speaker room after this session, immediately following. All right. Thank you all.