Click here for a transcript of the OSM at Facebook.
DRISHTIE PATEL: Hey, everyone. I guess I'll get started. Thanks, Eric, for the talk before. It's kind of a hard one to follow now that everyone's envisioning all the cool things that you could do. But I am Drishtie. And I'm going to be talking about what we've been doing with OSM at Facebook-- not Amazon Web Services, which is what it says in the program.
But basically, I'm here with Saurav, who's going to be joining us, as well, and talking about some of the new stuff we're thinking about as far as conflation. And so he'll deep dive. But just a quick recap, the reason Facebook cares about OSM is because we use OSM in a lot of our products-- pretty much the entire family of apps. And you can see it in different ways, so either in Check-Ins, Places, Pages, Messenger, Location, Instagram Places. So it's those little static and dynamic maps that you see within all of the Facebook apps.
So last year when we chatted, we were just rolling out OSM and exploring how all of this would work. And we were in about 22 countries. The team has worked really hard over the last few months. We were actually going to do this over a few years. And in a very Facebook way, someone came up and said, actually, we need it in six months. Could you do it?
And so we tried to do it much faster than we were planning to do. And now we are pretty much showing OSM across all of the world in all of our applications. So that's going out to over 2 billion people. So we streamlined our work. And so we would have two different places that we'd work on. And one of our focuses was validation-- making sure that the map is free of profanity and all of the bad stuff. We call this the remove bad stuff section of our team. And then some of you are familiar with the machine learning stuff that we've been doing, as well, which is adding the roads. So naturally, this would be the good stuff part of the team.
Part of our validation work was really just understanding what was in OSM, because I don't think anyone had a good idea of all the different profanity that's in there, all of the vandalism, everything that could go wrong. And when you're about to show your map to 2 billion people living in all over, you know, in all places of the world, it's really important that we don't end up being a headline somewhere or that there's profanity, because it does reflect, again, on the company.
So we started doing our own research. And part of this was curating profanity lists from everything that was online and open, talking to different integrity teams within Facebook, and just generally crowdsourcing from all the people that we knew, all of the bad words that they knew, and added them to-- added to a shared doc--
--which was really fun. So we ended up with a database of about 40,000 words-- 40,000 in 43 languages. And what we did with this was basically match this against the world PBF file. And we took a snapshot of it. And we were trying to get a sense of what was going on.
We ended up with a result of 84,000, which, as you can imagine, is really hard to manually go and map and QA each individual one. So we did some filtering by different rules and matching against POIs until we came with a much more manageable list that we could go through manually. And this ended up at about 2,000, which we manually QA'ed for a couple of reasons. One-- a lot of curse words or bad things that show up are not so much that it's a bad word. But it's contextually a bad word.
So there's examples of the word hill in Russian and which showed up on many of our integrity words. But if it were in Russia, it would also be a hill but in the wrong context had different meanings. And so this is only possible if someone from that area actually looked at the map and was able to figure out what those differences were. Eventually what we were happy to find out is that only 139 of them were actually issues. And some of them had been there for a long time and probably just didn't render or in places that were not looked at. And we went ahead and fixed those in OSM.
The reason we're able to do some of that contextual and manual QA is because we have a pretty diverse team at Facebook, not only within our Maps team, but the larger team at Facebook. There's a lot of integrity initiatives that we can borrow language support from. So we have people come and sit with us for a couple of weeks. And we QA together and make sure that we're checking everything.
Just a quick visual representation of what those profanity looks like-- you could see it's very much skewed to North America and Europe. And that's because we have a bigger database for bad words in English. So we probably, as a community, need to get a better sense of what's going on everywhere else and think of words that could potentially go wrong or that are bad words in different places. So we can make a more extensive list.
We also checked our PBF file against OSM Cha. And what we found is that less than 30% of them were actual issues. So we spent a ton of time probably manually QA'ing over 25,000 of these to eventually find that about close to 2,000 of them were actual issues that we went ahead and fixed in that snapshot. And if you guys heard Lucas's talk, one of the things that he mentioned was figuring out how we can all add our different QA systems into OSM Cha. And this would help with some of this duplication of effort, right? So instead of a team spending a month going through these lists, if someone's done it before, it'd be cool if we could all put them in the same place.
Thankfully at Facebook, we have a ton of machine learning efforts. And so there's a lot of pipelines that are already doing different types of OCR. And so we tried that to just explore what would happen if we looked for profanity. And this is specifically geometry. And we use this for letters. And for the most part, a lot of this is obviously horizontal text within a 10 to 20 degree threshold. And so we quantized it and made sure we had it in different angles. So we could pick up different things. And we got a pretty good result.
But then again, there's billions of tiles. And so if you imagine at every zoom level, checking this would be very costly. So we QA'ed it at the most used tiles levels, which is usually between zoom level 15 and 18. And just to give you an idea, at zoom level 15, you have 1 billion tiles. So this is a very expensive process to find nine issues. But the good thing is, there's only nine issues within this specific file. But the exploration helped us understand what we were dealing with and the fact-- and that what we ended up learning is that we don't-- instead of doing a lot of these separate checks for different things, it would be cool if we had one central process.
And we would check a ton of these different issues. And then also at the end of the day, you always need that manual QA. And the idea of a machine helping you look at 1 billion tiles is much easier than getting a team that could potentially look at 1 billion tiles. So that's been some of the work we've been focused on.
And then on the good side, a lot of you guys probably have seen this before. So I'm going to zoom through this really quick and get to the core stuff. We've still been doing the machine learning on roads. And we're finished with Thailand. And we've now been doing Indonesia. Each time that we move to a new area, we have to tweak the model a little bit and add more training data. This is what our training data looks like. We do this in-house at Facebook with our editing teams.
And this is just a quick example of the work we've been doing in Indonesia lately. This is the raw output of what pops out from the machine. And we basically apply a really high threshold so that we could find a centerline and avoid false positives. So I know a lot of people are always worried about the machine learning picking up things that are not there.
This is the process that basically helps. And you'll see that a lot of the roads start looking fragmented, because in the areas where it's not that confident, it actually drops out, which is why we have the second process that goes through. It will trim, connect, and merge all of these roads. And so it will use different algorithms to figure out the distance between the two and fill those gaps in.
Secondly, we'll go and remove island roads. This is not because those roads don't exist. But they don't visually connect into the overall network. And this is usually in very dense vegetation areas or rural areas, where there's a lot of green cover. It's seasonal. Things change all the time. You can't see some of those. And so instead of having these little pieces all over the place, if they're not a major network, we usually take them out.
Eventually, we get them into ID. This is actually four different tasks. You can kind of see the grid in the middle. And so by the time it gets to our editors, it's already lint tagged, connected with the OSM version, as well. And so everything can be easier to map. We use the Tasking Manager. We forked a Tasking Manager 2. And so we still use the older version of it but have pretty much changed it to work with the Facebook workflow.
And so this helps us open tiles that are next to you to get visual context without affecting the person that's opening it to actually edit it. We have some interesting ways of randomly assigning tasks so that you're not working on tasks right next to each other, which avoids some of the issues of connectedness between the two. Lots of linting and we have three editors actually go through before it's published. So we have a editor, a validator, and a publisher. So it actually gets seen three times before it goes live into OSM.
This is the ID Editor, which we've also changed quite a bit. So we've taken a lot of the JOSM integrity tracks and built them into ID. This just helps us to have a more quicker workflow. So if we do changes to the tool, we don't have to have people re-download JOSM, make sure they have the correct plugins make sure everyone's on the same kind of tool. So it helps us create a much easier workflow. And we can also track quality and create, like, much more user friendly interfaces for people that are newer to Maps.
We have a lot of hotkeys, color coding. We pretty much get requests everyday. We probably have a three page list of all the requests that come from our editors. And then we do a voting process on whether everyone likes it and then pick different tools that we add. This tool is open sourced. So it's not running. In the next few months, we plan to actually serve it and run it ourselves. So people should be able to use it.
We've run the model on other countries. So this is an example of Mexico. And we've shared data with partners. And the model does pretty well. And one of our ideas for exploring this was just to see how it would run in different parts of the world, right? So we've been focused on Southeast Asia. But it turns out it did pretty well on Mexico. It also did pretty well on Uganda.
So something we've done that was new in Indonesia-- because there's a much stronger local community, so we're able to work with the local community to do some of the mapping. So essentially instead of us creating all of the roads and having a team in California edit them, we worked with them to have the tooling opened, did a lot of the machine learning, the post-processing, and create the tasks. But the local community is actually the one that ends up putting it into OpenStreetMap, which is fantastic.
So we spent the first three months with them just cleaning up Indonesia. And because there was a lot of different types of mapping, it was quite a huge chunk of work just to make sure everything was aligned. We didn't have overlapping buildings-- just overall lots of little things that helped make the map more usable and easy to display.
We then worked with the local team and did some in-person training. We have weekly calls, work with them on different tasks, and are able to get that through. And then the best part about the local community is the ground validation. So you can see in the example, we do a lot of work looking at satellite imagery. A lot of our decision making is based on satellite imagery. And so we keep our tagging very generic, because we don't want to add stuff that we are not sure about on the ground. But if they're on the ground, they can verify different things, like what kind of bridge. Is it usable? What is its capacity? So that's been a really great change for us.
Over on the crowdsourcing side, we also are able to use the Facebook platforms for crowdsourcing. So everyone adds a lot of businesses, a lot of their POIs. We have a really rich POI database. And people usually add multiple POIs on a road. So we know that they're adding their street address, exactly where their building is, what their business is. And so we can crowdsource back to that specific area and ask them to verify what the road names are. We ran a test in Indonesia for 10 days. And we were able to get a confident answer on 176,000 street names-- so lots of users always using Facebook. And they help us to get stuff that's valuable to OpenStreetMap.
So how much mapping is left? We always talk about what percentage and everyone focusing on different areas in the sense that there's a lot of work to do, because there really is. There's a lot of work, a lot of mapping that still needs to be done in areas that we're not focused on predominantly. With Facebook because of our connectivity projects and our mission to connect the world, we are focused in the areas that not everybody else is in, so thinking of Southeast Asia and Africa, Latin America. And you can see by this map, these are the most-- these are the least mapped areas.
So something that we've released at Facebook are the population density maps. And these are basically very similar to how we extract streets. They're extracting buildings. And we have a good sense of population. These are open data sets that are ready published with Ciesin. And these are helpful, because they can help you target specific areas of mapping.
So for example, the Humanitarian-- OpenStreetMap and Red Cross do a ton of mapping. They were able to use the population density maps to isolate specific tasks that had buildings in them. So you're not wasting your time opening up tasks, closing them just to find there's no buildings. And they can move a lot faster and create more specific tasks, which then helps you crowdsource people to map specific areas. And then you can go on to the ground and do some very targeted field mapping. And in this case they were specifically mapping for the measles and rubella initiative, which is not a concern for most of us in North America. But in Africa and Asia, this is pretty serious.
We've also done a bit of analysis on understanding, like, what that population density looks like and then using this information to quantize some of the features that still need to be improved in OSM. So this is the population density for Mexico, for example. And you can see the difference in disparity from low to high density areas. And this is just, again, satellite imagery. Sorry. I'll go a little slower.
And this is basically a map of analyzing the population density against specifically the road features. And you could do this for any feature, whether it was buildings or land use. And this gives us a proxy for how much of the world is maps. And it gives us a sense of what our missed opportunity is. So what are the specific areas that have a lot of people and don't have enough data? And so you can get very targeted areas that we need to continue mapping.
And this is just an example. In Uganda specifically, like, this is what it looks like at 85% being mapped. This is 90%, 95%, and then eventually 99.5%. So there's still a lot we can do, and this will just help us target areas in a much more efficient way.
So I'm going to hand it over to Saurav, who's going to tell us more about our project Mobius.
SAURAV MOHAPATRA: Hi. My name is Saurav, and I'm going to talk about continuous ingestion of the data. So Lucas made a great point in the previous presentation, where he referred to OSM as an eventually consistent system. Changes are continually being made, and because of the community's efforts, bad or harmful edits are getting fixed.
So this is kind of like a stream. And for all the 100% rollout of the OSM that we have done at Facebook, we are still running a snapshot that's close to a year old. Because the volume-- the major challenge is about freshness and correctness, about balancing these two axes. Like if data needs to be as fresh as possible while being as close to 100% correct, a total recall system. And because of the vandalism and all of that, till we had a strategy, we sort of pinned the data snapshot that we use to one thing that we're confident on and applied hard patches on that, but we kind of stayed there.
So the challenges obviously are [? variates. ?] And the later you get, the volume of changes you have to reconcile grows. So to address all of this, we started thinking about solution approaches. And we looked at things like static diffing, patching, or stream diffing. And all of them had the same central assumption, that catching up to OSM is all or nothing. You take these changes, you apply them, you read somewhere.
But we started playing with that assumption. And one thing-- we played a "what if" scenario with, what if it was possible to reformulate the changes mathematically so that we could pick and choose? Basically, this problem is very similar to source code management, continuous integration kind of thing. And we tried to see if we could apply principles of cherry picking with geometrical consistency built into this process.
So we came up with this thing called logical changesets. And simply, the formulation is, if we have a snapshot, which is our downstream snapshot, and OSM had-- we have a static diff between the two, can we define a function over this diff which gives us a set of reclustered change groupings which are [INAUDIBLE] important so we could just throw them out at reviewers and they could apply them, and they could just-- reapplying a change has the same effect.
Commutative. So there is no order enforcement, and each of this is independent. So we could focus on privatized area. So for example, we did the roads in Thailand. So we could have a concentrated effort to just catch up in Thailand while maintaining the geometrical consistency of this decision that, if you apply this change group, your geometry is consistent. If you choose not to apply it, your geometry is still guaranteed to be consistent.
And so this is the competition view of that. So eventually, when you calculate a static diff between two OSM snapshots, what you are collecting is a flat-- if you flatten that list out, you get a lot of geometry CRUD operations. Some node or [INAUDIBLE] relation either got created, deleted, or modified.
So we take these, and we run the algorithm on it, and we recluster that into logical changesets, like logical groupings, while maintaining some of the metadata. And then, the review process now becomes-- we jokingly call it like Tinder for changesets. It becomes a swipe left, swipe right kind of thing. Because you can visualize what the world looked like before this group of changes was there, and after these changes are applied.
So this one doesn't show it, but if you look at the OSM data, there is geometric data, there is semantic data, and there is cosmetic data. Cosmetic data is like place names, labels.
So there are similar pipelines and workflows for approving those. But the net result of that is now, we can actually take the changeset that's been approved, break it down back into the individual geometric CRUD operations, and apply that to the changeset. So this still keeps the planet file geometrically consistent, but parts of it are moved forward.
So in the ideal case, 100% of your changesets got approved. You move absolutely to it. And we named it Mobius because of that, because there is no beginning, no end. You're kind of going in a loop, and you are accepting as much as is possible, or you want to.
So this works because we are changing the key assumption into, it is actually possible to create logical changesets out of this. But our needs as a company-- Facebook's needs are, right now, rendering. So our needs aren't as widespread as companies that do navigation or do other things on OSM. The reason we are sharing this is we want to have a larger discussion around this and see if this idea scales up to doing this for the entire OSM data set and all its characteristics.
And we did our first prototype around four, five months back for the area around Boston, to see if this idea would actually work. And with our initial naive graph clustering algorithm, we were able to take the one year's worth of data at 4,000 changesets. We were able to compress it down to around 1,900, 2,000 changesets.
But the beautiful part of this was 48% of the changesets had a single-feature change. We were able to isolate somebody changed a node, or changed a tag on something.
So those things are no-brainers. A human just swipes through it. So the idea is, if we can create a workflow for a human visually where you can just accept or reject diffs quickly, then catching up to a lot of the changes for a long period of time becomes quicker.
And as we go ahead with this, our plan is to integrate-- Drishtie talked about the profanity detection and the OCR-based integrity checks, and the standard heuristics-based geometry checks, and all of that. So we plan to build it into the process so that when the logical changesets are generated, we can auto-reject a lot of the suspicious changes. And the others which we are in-between, we could flag them with a complex [INAUDIBLE], so that when the UI show when that goes up to a reviewer, this thing has a red, yellow, or a green thing. Green things says that it looks OK to me. So this is kind of like a linter for the changes which shows you how complex or how suspicious the changes.
And next, obviously, we have to scale this. Right now, I'm glad to say that we have been running this for the whole world. And I will share the results probably next time. And we have to refine the clustering.
And this is where I want a discussion with people in this room and the community, is to-- our needs are simple, so we could get by with a naive thing that we can slowly make better. But we would love to talk about, what does it look like for us to create logical changesets which take the entire spectrum of OSM data characteristics into account?
The team is here. Would you guys put your hands up? So find any of us. You can talk to us about this. We love to talk about this. I personally love to talk about anything. And [? Drishtie? ?]
OK. So I get to play this video. So this is not related to my stuff, but this is related to-- yeah, this is the Thailand road completion project.
DRISHTIE PATEL: Awesome. So that was just the update. Last year, when we came by, I think we were halfway done. So now we've pretty much hit Bangkok, and mapped out all of Thailand, and are pretty along the way in Indonesia as well. But I don't know if we have question time.
I can't see anyone. But yeah, if there's any questions, I guess I can take them. Or no questions. That works-- OK, go ahead.
AUDIENCE: What is the ambition about the program [INAUDIBLE]. Are you going to roll out in another country, or [INAUDIBLE] or [INAUDIBLE]?
DRISHTIE PATEL: That's a great question. We've been working in Indonesia with the local community, and we've had a lot of requests for doing this for other countries. It's not scalable in the way that we're doing it because we've had a team of people working on Thailand. We QA our work so much.
So we're actually in the process of building a service for this, and so that we could run the models on the whole world. And we just actually signed a DG deal to get the imagery for the whole world so we could start doing this. And then eventually, we will just be opening it up with the tools so that people have the right way to get the data in if they choose to. Yeah.
All right. Perfect. Thanks, guys.