GEOG 479
Cyber-Geography in Geospatial Intelligence

Growth of the Technology Environment and the Growth of Information


Growth of the Technology Environment and the Growth of Information

Knowledge Management workers often divide information into two groups - implicit and explicit. Implicit is thought of as existing only in the mind, while explicit is what is recorded in some fashion. When man first adopted the invention of writing, it allowed us to think of information as a set of discrete objects not limited to what was contained in the human brain. Socrates warned his student Plato about the use of writing as a danger to the means of teaching in ancient Greece, which was dependent on rote memorization. Ironically, if Plato had not written about Socrates, we might not even know about him. The advent of the Information Age allows us to see information everywhere - in locations as diverse as strands of DNA and the firing of the brain’s neurons, regardless of what those firings might mean. Now that we have such contrivances as Google and Wikipedia, we start to think of “information without end,” ubiquitous and easily accessed (this is actually a stated goal of Google – but more on this later).

How the information is related speaks to the questions of where it is created, where it is stored and disseminated, and how the geospatial components contained therein should be visualized. Claude Shannon, a mathematician with Bell Labs, created the field of Information Theory in his 1948 paper, “A Mathematical Theory of Communication.” Everything contained information encoded in some fashion as “bits” – information is actually a measure of quantity, not meaning. But then, what exactly is information? Thanks to the mathematical information theory espoused in the paper, information can be measured in bits. Shannon decided that as the fundamental unit of information, a bit would represent the amount of uncertainty that exists in the flipping of a coin: 1 or 0. Utilizing the mathematics of the day, a mathematician or engineer could quantify not just the number of symbols, but also the relative probabilities of each symbol’s occurrence. Information, as defined by Shannon, became a measure of surprise - of uncertainty with an estimated probability. These are abstractions; a message is no longer a tangible piece of material like a piece of paper. The world as it exists today promises endless possibility given the connectedness we see between the different sets of explicit knowledge. Shannon’s goal was to explain the relationships involved with sending an intelligent message over a noisy transmission line. It has gone far beyond that and eventually permeated such divergent fields as computer science and molecular biology.

One field that seemed to get left out - the most fundamental of all for our discussion - physics. Following World War II, the great area of science was the splitting of the atom and the harnessing of nuclear energy. Communications research was viewed as a field for electrical engineers. Particle physicists had quarks; they didn’t seem to need “bits.” The man to point out the contradicting view was Rolf Landauer. Landauer escaped Germany in 1938 and eventually earned a PhD in physics from Harvard. His landmark paper, “Information is Physical,” was written while he was on staff at IBM. In it, he pointed out that bits are not abstract after all (or more precisely, not merely abstract). His point is based on the concept that information can’t exist without some physical component to contain it, whether it’s a chip on a stone tablet, a typed hole in a punch card, or a subatomic particle with either spin up or down. According to Landauer, information is “tied to the laws of physics and the parts available to us exist in our real physical universe.” Landauer and his colleague Charles H. Bennett joined the fields of information science and physics by invoking the term they called the thermodynamics of computation.

Landauer showed that most logical operations exhibit no energy cost at all. When a bit flips from 0 to 1, or 1 to 0, energy is conserved, and the information is preserved. Bennett found that the one element of computing that must require dissipation of heat is erasure of the bit. When an electronic computer clears a capacitor, it releases energy. For a bit to be lost, heat is dissipated. Information might still seem a sort of abstraction, but bits are binary choices. Coin flips, yes/no, 1/0, on/off would instinctively seem to be insubstantial. The question presented then becomes one of how can they be as fundamental to physics as the building blocks of matter and energy? Earth, air, fire, and water are, in the end, all made of energy, but the different forms they take are determined by information. To do anything requires energy. To specify what is done requires information, and most information currently stored in databases contains a geospatial component. Understanding this is key. Most of the dialogue of “cyber” has been handed over from the practitioners in the intel community to the IT community. Neither community is necessarily well versed in thinking geospatially, which is in a sense why some of the dialogue about "cyber" makes such little sense and why leadership continues to struggle with it.

Once we begin to examine both static information and dynamic flows of information and begin to view it geospatially, it makes more sense.

After analyzing more than 10 million e-mails from Yahoo! mail, a team of researchers noticed a compelling fact: e-mails seem to flow much more frequently between countries with economic and cultural similarities. Among these factors that matter are GDP, trade, language, prior colonial relations, and a couple of academic cultural metrics - power-distance, individualism, masculinity, and uncertainty. The resulting paper, “The Mesh of Civilizations and International Email Flows,” was written by researchers at Stanford, Cornell, Yahoo!, and Qatar’s Computational Research Institute. Countries with measurable real-life ties such as a border, a significant number of international flights or a serious binational trade relationship tend to e-mail each other more. There are some discrepancies, as well – for example, countries within the EU tend to e-mail less than the research model predicted. The real conclusion comes toward the end, when the paper posits the results as possible evidence for Samuel Huntington’s controversial “Clash of Civilizations” theory. From "The Mesh of Civilizations and International Email Flows":

In this respect we cautiously assign a level of validity to Huntington’s contentions, with a few caveats. The first issue was already mentioned – overlap between civilizations and other factors contributing to countries’ level of association. Huntington’s thesis is clearly reflected in the graph presented…. The second limitation concerns the fact that we investigated a communication network. There is no necessary “clash” between countries that do not communicate, and Huntington’s thesis was concerned primarily with ethnic conflict.

Huntington’s ideas were controversial from the start and have been criticized frequently by many social scientists such as Steven M. Walt's critique in "Building Up New Bogeymen" and Edward Said's response in "The Clash of Ignorance," both published in different editions of Foreign Policy. What the ICT research done by Bogdan State, Patrick Park, Ingmar Weber, Yelena Mejove, and Michael Macy in their paper; "The Mesh of Civilizations and International Email Flows," seems to show, though, is that there is a greater tendency for individuals to communicate within broad groups that seem to align with what Huntington identified as the different “civilizations” between which he posited would be the source for conflict in the 21st century. It also seems to project a mapping of a dynamic flow of communication across the planet. Other mappings of the relationships between different views of information include the “Map of Science” from Los Alamos National Labs in Figure 14, (this is a view of academic papers that reference each other), the database of databases as seen in Figure 15, and finally a representation of the source articles from Wikipedia in Figure 16. All are attempts to show various distributions of knowledge or communication flows, but none are “complete” in the sense that they fully describe more than a simple one-dimensional view of a knowledge based relationship. The challenge increases when we start to add a temporal component, as in the case of software such as “Recorded Futures.” More on this later, but first, we must outline some of the aspects of data ownership, the limits of geolocating by IP address, and the impact on privacy.

Graph showing results of a study of over 10 million emails. See caption for full description.
Figure 13. Results of a study of over 10 million e-mails, where they originated and where they were sent. There appear to be natural patterns and distributions that are similar to what Stephen Huntington predicted as “civilizations.” While they are not natural groupings for sources of conflict, they do seem to group naturally along lines similar to what he predicted.

While it may seem that we’re living in a borderless world where ideas, goods, and people flow freely from nation to nation, “We’re not even close,” says Pankaj Ghemawat a professor of strategic management at the IESE Business School in Spain. With great data (and a good survey), he argues that there’s a delta between perception and reality in a world that’s maybe not so interconnected at all. He posits that our world isn’t flat – it’s at best semi-globalized, with limited interactions between countries and economies, which would seem to corroborate the study above.

Listen to the Experts (Optional Talk)

This is an optional talk filmed in Edinburgh in June 2012 where Pankaj Ghemawat explains his position. His latest book “World 3.0,” describes another kind of networked economy-the cross-border "geography" of Facebook and Twitter followers. The take-away from his talk and his book describes that while there is a huge amount of connectivity present in the world currently, there is room for a great deal more (over 2/3 the planet) – with all the unpredictability that accompanies it (17:03).

Pankaj Ghemawat: Actually, the world isn't flat
Click here for transcript of the World isn't Flat video.



PANKAJ GHEMAWAT: I'm here to talk to you about how globalized we are, how globalized we aren't, and why it's important to actually be accurate in making those kinds of assessments. And the leading point of view on this, whether measured by number of books sold, mentions in media, or surveys that I've run with groups ranging from my students to delegates to the World Trade Organization, is this view that national borders really don't matter very much anymore. Cross-border integration is close to complete, and we live in one world. 

And what's interesting about this view is, again, it's a view that's held by pro-globalizers like Tom Friedman, from whose book this quote is obviously excerpted. But it's also helped by anti-globalizers who see this giant globalization tsunami that's about to wreck all our lives, if it hasn't already done so. 

The other thing I would add is that this is not a new view. I'm a little bit of an amateur historian, so I've spent some time going back trying to see the first mention of this kind of thing. And the best earliest quote that I could find was one from David Livingstone writing in the 1850s about how the railroad, the steamship, and the telegraph were integrating East Africa perfectly with the rest of the world. 

Now, clearly, David Livingstone was a little bit ahead of his time. But it does seem useful to ask ourselves, just how global are we, before we think about where we go from here. 

So the best way I've found of trying to get people to take seriously the idea that the world may not be flat, may not even be close to flat, is with some data. So one of the things I've been doing over the last few years is really compiling data on things that could either happen within national borders or across national borders. And I've looked at the cross-border component as a percentage of the total. 

I'm not going to present all the data that I have here today. But let me just give you a few data points. I'm going to talk a little bit about one kind of information flow, one kind of flow of people, one kind of flow of capital, and of course trade in products and services. 

So let's start off with plain old telephone service. Of all the voice calling minutes in the world last year, what percentage do you think were accounted for by cross-border phone calls? Pick a percentage in your own mind. Answer turns out to be 2%. If you include internet telephony, you might be able to push this number up to 6% or 7%, but it's nowhere near what people tend to estimate. 

Or let's turn to people moving across borders. One particular thing we might look at in terms of long-term flows of people is, what percentage of the world's population is accounted for by first generation immigrants? Again, please pick a percentage. Turns out to be a little bit higher. It's actually about 3%. 

Or think of investment. Take all the real investment that went on in the world in 2010. What percentage of that was accounted for by foreign direct investment? Not quite 10%. 

And then finally, the one statistic that I suspect many of the people in this room have seen, the export to GDP ratio. If you look at the official statistics, they typically indicate a little bit above 30%. 

However, there is a big problem with the official statistics in that if, for instance, a Japanese components supplier ship something to China to be put into an iPod, and then the iPod gets shipped to the US, that component ends up getting counted multiple times. 

So nobody knows how bad this bias with the official statistics actually is. So I thought I would ask the person who's spearheading the effort to generate data on this, Pascal Lamy, the director of the World Trade Organization, what his best guess would be of exports as a percentage of GDP without the double and triple counting. And it's actually probably a bit under 20% rather than the 30% plus numbers that we're talking about. 

So it's very clear that if you look at these numbers or all the other numbers that I talk about in my book, World 3.0, that we're very, very far from the no-border effect benchmark which would imply internationalization levels of the order of 85%, 90%, 95%. 

So clearly, apocalyptically-minded authors have overstated the case. But it's not just the apocalyptics, as I think of them, who are prone to this kind of overstatement. I've also spent some time serving audiences in different parts of the world on what they actually guess these numbers to be. 

Let me share with you the results of a survey that Harvard Business Review was kind enough to run of its readership as to what people's guesses along these dimensions actually were. So a couple of observations stand out for me from this slide. 

First of all, there is a suggestion of some error. 


Second, these are pretty large errors. For four quantities whose average value is less than 10%, you have people guessing three, four times that level. Even though I'm an economist, I find that a pretty large error. 

And third, this is not just confined to the readers of the Harvard Business Review. I've run several dozen such surveys in different parts of the world, and in all cases except one, where a group actually underestimated the trade to GDP ratio, people have this tendency towards overestimation. 

And so I thought it important to give a name to this. And that's what I refer to as Globaloney, the difference between the dark blue bars and the light gray bars, especially because I suspect some of you may still be a little bit skeptical of the claims. I think it's important to just spend a little bit of time thinking about why we might be prone to Globaloney. 

A couple of different reasons come to mind. First of all, there's a real dearth of data in the debate. Let me give you an example. When I first published some of these data few years ago in a magazine called Foreign Policy, one of the people wrote in, not entirely in agreement, was Tom Friedman. And since my article was titled, "Why the World Isn't Flat," that wasn't too surprising. 

What was very surprising to me was Tom's critique, which was, Ghemawat's data are narrow. And this caused me to scratch my head because, as I went back through his several hundred page book, I couldn't find a single figure, chart, table, reference, or footnote. 

So my point is, I haven't presented a lot of data here to convince you that I'm right. But I would urge you to go away and look for your own data to try and actually assess whether some of these hand-me-down insights that we've been bombarded with actually are correct. So dearth of data in the debate is one reason. 

A second reason has to do with peer pressure. I remember I decided to write my "Why the World Isn't Flat" article because I was being interviewed on TV in Mumbai, and the interviewer's first question to me was, Professor Ghemawat, why do you still believe that the world is round? 

And I started laughing because I hadn't come across that formulation before. And as I was laughing, I was thinking, I really need a more coherent response, especially on national TV. I'd better write something about this. 

But what I can't quite capture for you was the pity and disbelief with which the interviewer asked her question. The perspective was, here is this poor professor. He's clearly been in a cave for the last 20,000 years. He really has no idea as to what's actually going on in the world. 

So try this out with your friends and acquaintances if you like. You'll find that it's very cool to talk about the world being one, et cetera. If you raise questions about that formulation, you really are considered a bit of an antique. 

And then the final reason which I mention, especially to a TED audience, with some trepidation, has to do with what I call techno trances. If you listen to techno music for long periods of time, it does things to your brainwave activity. Something similar seems to happen with exaggerated conceptions of how technology is going to overpower, in the very immediate run, all cultural barriers, all political barriers, all geographic barriers. 

Because at this point I know you aren't allowed to ask me questions, but when I get to this point in my lecture with my students, hands go up. And people ask me, yeah, but what about Facebook? And I got this question often enough that I thought I'd better do some research on Facebook. Because in some sense, it's the ideal kind of technology to think about. Theoretically, makes it as easy to form friendships halfway around the world as opposed to right next door. 

What percentage of people's friends on Facebook are actually located in countries other than where the people we're analyzing are based? The answer is probably somewhere between 10% to 15%. Not negligible, so we don't live in an entirely local or national world. But very, very far from the 95% level that you would expect. 

And the reason's very simple. We don't-- or I hope we don't-- form friendships at random on Facebook. The technology is overlaid on a pre-existing matrix of relationships that we have, and those relationships are what the technology doesn't quite displace. Those relationships are why we get far fewer than 95% of our friends being located in countries other than where we are. 

So does all this matter? Or is Globaloney just a harmless way of getting people to pay more attention to globalization-related issues? I want to suggest that, actually, Globaloney can be very harmful to your health. 

First of all, recognizing that the glass is only 10% to 20% full is critical to seeing that there might be potential for additional gains from additional integration. Whereas if we thought we were already there, there would be no particular point to pushing harder. 

It's a little bit like, we wouldn't be having a conference on radical openness if we already thought we were totally open to all the kinds of influences that are being talked about at this conference. So being accurate about how limited globalization levels are is critical to even being able to notice that there might be room for something more, something that would contribute further to global welfare. 

Which brings me to my second point, avoiding overstatement is also very helpful because it reduces, and in some cases even reverses, some of the fears that people have about globalization. So I actually spend most of my World 3.0 book working through a litany of market failures and fears that people have that they worry globalization is going to exacerbate. 

I'm obviously not going to be able to do that for you today. So let me just present to you two headlines as an illustration of what I have in mind. 

Think of France and the current debate about immigration. When you ask people in France what percentage of the French population is immigrants, the answer is about 24%. That's their guess. Maybe realizing that the number is just 8% might help cool some of the super heated rhetoric that we see around the immigration issue. 

Or to take an even more striking example, when the Chicago Council on Foreign Relations did a survey of Americans asking them to guess what percentage of the federal budget went to foreign aid, the guess was 30%, which is slightly in excess of the actual level of US governmental commitments to federal aid. 

The reassuring thing about this particular survey was, when it was pointed out to people how far their estimates were from the actual data, some of them-- not all of them-- seemed to become more willing to consider increases in foreign aid. 

So foreign aid is actually a great way of wrapping up here because if you think about it, what I've been talking about today is this notion, very uncontroversial amongst economists, that most things are very home biased. Foreign aid is the most-- aid to poor people is about the most home biased thing you can find. 

If you look at the OECD countries and how much they spend per domestic poor person and compare it with how much they spend per poor person in poor countries, the ratio-- Branko Milanovic at the World Bank did the calculations-- turns out to be about 30,000 to 1. 

Now, of course, some of us, if we truly are cosmopolitan, would like to see that ratio being brought down to 1 is to 1. I'd like to make the suggestion that we don't need to aim for that to make substantial progress from where we are. If we simply brought that ratio down to 15,000 to 1, we would be meeting those aid targets that were agreed at the Rio Summit 20 years ago that the summit that ended last week made no further progress on. 

So in summary, while radical openness is great, given how closed we are, even incremental openness could make things dramatically better. Thank you very much. 


Credit: TED

Other types of knowledge have been “mapped” as well. Maps of science have been created from citation data to visualize the structure and relationships between various fields of scientific activity. With most scientific publications now accessed online, the scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains.

Over the course of 2007 and 2008, the authors from Los Alamos National Labs collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators, and institutional consortia. The resulting reference dataset covers a huge part of world-wide use of scholarly web portals from 2006, and shows coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e., a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model in Figure 14 was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences.

Map of Science as compiled by researchers. See caption for full description.
Figure 14. The Map of Science as compiled by researchers at Los Alamos National Labs. This is a visualization of over 1 billion citations from scientific journals collected over 2006-2007 and displayed according to connections by discipline. A further discussion of the methodology and categorization is discussed at Plos One.
Credit: Los Alamos National Labs.
Linking Open Data cloud diagram. See caption for full description.
Figure 15. The database of databases compiled in the linked data project with the goal to have all databases everywhere linked and accessible in real time[1]. This image shows datasets that have been published in Linked Data format by contributors to the Linking Open Data (LOD) community project and other individuals and organizations. It is based on metadata collected and curated by contributors to the CKAN directory. After going to the webpage, clicking the image will take you to an image map, where each dataset is a hyperlink to its homepage.
Credit: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.

The site in Figure 15 is running a powerful piece of open-source data cataloging software called CKAN, written and maintained by the Open Knowledge Foundation. Each 'dataset' record on CKAN contains a description of the data and other useful information, such as what formats it is available in, who owns it and whether it is freely available, and what subject areas the data is about. Other users can improve or add to this information (CKAN keeps a fully versioned history). Most of the data indexed at the Data Hub is openly licensed, meaning anyone is free to use or reuse it however they like. Perhaps someone will take that nice data set of a city's public art that you found, and add it to a tourist map - or even make a neat app for your phone that'll help you find artworks when you visit the city.

Number of geotagged Wikipedia articles per country. See caption for full description.
Figure 16. Items geotagged in Wikipedia in all languages. When comparing this map with Dr. Barnett’s map, one could draw the conclusion that not only is the information existent in the Functioning Core, but information seems to beget information. The New York Times Idea of the Day in December 2009, pointed out that more Wikipedia articles are written about fictional places like Middle Earth than about many countries in Africa, the Americas, and Asia. Behold the “terra incognita” of the Internet.