Natural Language Processing
Despite the onslaught of new media sources that include images (both moving and still) and audio streams, we still face a challenge today when it comes to analyzing the information included in text. Twitter, Facebook status updates, blog posts, e-mail, etc... all still at their core rely heavily on text to communicate information between users. This introduces a challenge for geographers - we'd like to know which places are talked about in these conversations. So, how can we extract locations and map them?
The science of Natural Language Processing is a burgeoning area of inquiry that has resulted in a lot of progress on training systems to extract, recognize, and place into context all aspects of what we try to communicate through text. More specifically, named entity recognition (NER) can be used to extract location mentions from text sources. As the video lecture here from Stanford professors Dan Jurafsky and Chris Manning shows, NER is a relatively mature technology now, but there remain some substantial challenges associated with recognizing and reducing ambiguity and interpretation problems when it comes to natural language processing.
Extracting and Geolocating Places in Text
Once you have extracted location entities from text, you then need to geolocate them. This is also a non-trivial step, even with contemporary methods. Many places have alternate spellings, informal references (LA = Los Angeles, Happy Valley = State College area), and duplicate names abound for nearly every place you can imagine. One good thing is that we've got a great gazetteer already of named places around the world - in the form of GeoNames. GeoNames also provides a service for geocoding using their gazetteer. For projects like SensePlace2 at Penn State, we first rely on customized NER software to extract location names, and then we run things through GeoNames to try and attach coordinates to those locations. Because each step is far from perfect, we're doing work now on refining both sides of that process to deliver better accuracy in location extraction from text. In the screenshow below, you can see our GeoTxt project, which has a simple web interface, but is primarily designed to be used at the API level by other programs to deliver better quality location extraction from text sources. You can try GeoTxt yourself now (just paste in some text you've got with placenames included).
Deliverables for this week's emerging theme
- Post a comment that describes how you think text analysis tools might integrate with GIS systems for emergency management. What are the big challenges associated with understanding locations (and mapping them) from text media sources that we should focus our attention toward solving?
- Then, I'd like you to offer additional insight, critique, a counter-example, or something else constructive in response to one of your colleagues' posts.
- Brownie points for linking to other technology demos, pictures, blog posts, etc... that you've found to enrich your posts so that we may all benefit.