The Unknowns: A Managers Guide to Open Source
PAUL RAMSEY: I'd like to start this-- start this morning with a digression, actually, off to the side. One of my favorite pieces of poetry was composed not by a Beat poet in Greenwich Village or by a 19th century romantic, but by one Donald Rumsfeld, then Secretary of Defense, from the Pentagon press briefing room on February 12, 2002. Hart Seely later formed the secretary's words into a poem, which was published in Harper's Magazine in June 2003 as "The Unknown."
"As we know, there are known knowns. There are things we know we know. We also know there are known unknowns. That is to say, we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know."
I found the same sentiments expressed less elegantly, but more forcefully, in diagram form, showing the things we know, the stuff we don't know, and the vast expanse of stuff that we don't know we don't know. And it's scary that the largest category by far is the one that we definitionally cannot comprehend, the stuff we don't know we don't know. Of course, this is an epistemological diagram of all knowledge. So, we can constrain it a bit by noting that, for practical purposes, we're really only concerned with the stuff we should know. But unfortunately, the stuff we should know still falls in all three categories.
And as IT managers, there is, amongst the stuff we should know, the stuff we should know about open source software. And I hope, at this point, we can all agree that open source is worth knowing about. The largest, most heavily trafficked companies on the internet have built their infrastructure on open source for practical, hardheaded business reasons. And open source is no longer just a technology fast follower. Leading-edge work in fields like software development tools, document search, big data analysis, and NoSQL storage, it's being spearheaded by open source projects.
And that's not because open source is just some kind of fad or just something the cool kids currently consider trendy. It's because open source is a bona fide trend. It's a trend that's been building for over 30 years. And it's changing not only the way we build software, but how we collaborate in building knowledge in general.
So I bring two perspectives to this topic, the perspective of a developer of open source software-- a hacker, a programmer, in particular on the PostGIS spatial database, which I've been involved in for 10 years-- but I also bring the perspective of a manager. I ran a small consulting company, Refractions Research, for 10 years. My goal today is to modify your internal epistemological diagrams a little bit, maybe to something more like this, and not because I expect you to all rush back home and implement open source, but because when you know more about open source, it becomes an option.
So to understand why open source is an option, it helps to have some background. So let's start with a little history. Once upon a time-- once upon a time, there was a young man with wild ideas about freedom who took on the established order of things, appeared to lose, but, in the end, changed the world forever, though perhaps not in ways you would approve of-- actually not that young man, although there is a striking resemblance.
In 1980, Richard Stallman was a programmer at the MIT Artificial Intelligence Lab. So some of the best minds in artificial intelligence worked together there. They shared ideas and implementations of those ideas in code. It was, to hear Stallman tell it, a golden age, a brief golden age of collaboration and intellectual ferment. And then one day-- and don't all horror stories start this way? One day, the lab got a new printer. It was a Xerox 9700. That's the actual printer. Unlike the print it was replacing, the new printer came with a binary-only printer driver, OK? The source code was not included this time.
Now, Stallman had modified the previous driver to send a message to users when the printer jammed. With the new binary driver, he couldn't do that. The situation was inconvenient. It was a pain.
Now, why couldn't Xerox just share their code? Everyone would be happier. Now most people would have shrugged, right-- eh, pff. But for Stallman, it was a galvanizing moment. So over the last five years, working in the AI lab, he'd gotten used to sharing code and ideas with other programmers. But now the atmosphere in computing was changing. And it wasn't just the printer driver.
A private corporation had started recruiting his colleagues in the lab. And once they were hired, they were allowed to exchange code to him anymore. Completely unrelated nerd trivia-- the company who was doing the hiring, Symbolics, was the very first company to register a dot-com domain name in 1985. So, the old computers in the lab, the software that ran on it were becoming obsolete. The new computers being purchased by the lab included operating systems that were locked down. You had to sign non-disclosures just to use them.
So, it was the death of the old collaborative community. And Stallman worried that the first step in using a computer was now to promise not to help your neighbor by accepting a license agreement. So as a highly talented, idealistic computer programmer, Stallman wanted his work to serve a larger purpose. The financial promise of working in the proprietary industry wasn't enough. The sterile intellectual amusement of continuing his work alone in academia didn't appeal. So, facing the death of his old intellectual community, Stallman asked himself, was there a program or programs that I could write so as to make community possible once again?
You can't use a computer without an operating system. So Stallman decided that, first, he needed to write an operating system. It had to be portable to many computer platforms. It should be compatible with the popular new Unix operating system to make it easy for people to run their existing programs on it. And most importantly, it should be free.
Now, Stallman had a very particular definition of "free." He meant that it should be free to run the software, you should be free to modify the software, and you should be free to share the software. You should also be free to share any modifications you made to the software. Stallman wasn't talking about free as in price. He was talking about free as in freedom.
Like, in a Latinate language like French, Spanish, or Italian, it's more obvious, right? He wasn't talking about logiciel gratuit, he was talking about logiciel libre, or software libre, or il software libero. He's talking about liberty, right? He's talking about liberated software. The key addition is liberty.
So in 1984, rather than join a computing industry he considered morally bankrupt, Stallman basically decided to start a new one from scratch. It was an audacious plan. Stallman called his new system GNU, which stands, recursively, for GNU's Not Unix. Do you see the recursion? GNU's not Unix. What's GNU? GNU's not Unix. What's GNU? GNU's not Unix.
Let me take just a very minor diversion here to add some extra flavor. In order to ensure GNU remained free and did not get subsumed into a proprietary system in the future, Stallman released his work using a scheme he called copyleft. Generally speaking, intellectual books-- books-- intellectual works-- books, movies, songs, computer programs-- they're either under copyright or public domain. The author either retains full control over the work-- like all rights are reserved-- or no control-- no rights are retained. Copyleft, and open licenses in general, use the copyright system to selectively grant permission and exert control over software through licensing.
Authors retain copyright, but they grant liberal usage rights via a license. So the copyleft license grants permission to all recipients of the code to use, modify, and redistribute the work in any way they wish with one exception. The license requires that any redistribution of the work or derived products include the source code and be subject to the same license. The legal language-- very complex. But the principles are hardly foreign. Share and share alike. Do unto others as you would have them do unto you.
So back on the highway-- in 1984, Stallman quits his job at MIT, and he starts working on GNU full-time-- no visible means of support. This is a labor of love. But where to start? OK, from a blank canvas, you want a completely free software ecosystem. What do you do first? If you wanted to build 100%, all-handcrafted house, you would start by handcrafting your tools. And Stallman did the same thing with GNU versions of software development tools.
He starts by writing a text editor, GNU Emacs, so he can write his free system using only free tools. The Emacs editor proves so popular, and internet access is still so rare, that he's able to earn a small living-- but the best programmers don't need Emacs-- a small living selling tape copies of the code, distributed under copyleft, of course. And then he writes a compiler, GCC. And you can still find GCC in every Linux distribution, also the Mac OS 10.
Stallman lives like a monk. He works like a demon. He attracts some followers and helpers who formalize the project in a foundation. By 1990, they have most of the components of an operating system. Most importantly, they have the full programming tool chain. They've got shells, compilers, debuggers, editors, core libraries, and so on-- all the things you need to write complex software. What they don't have is a Unix kernel, the piece of software that talks directly to the hardware. At this point, all their free tools are still being run on proprietary Unix.
OK, in 1991, a Finnish computer scientist named Linus Torvalds buys a new computer, an Intel 386. He's got access to Unix systems at the university as a student. And he wants to run Unix on his 386 at home. This is not possible. The good implementations for the 386 cost more than the computer itself. The cheap implementation Minix is quite limited. So, Linux writes his own kernel.
He uses Stallman's GNU tools to write and compile it. Add in August of 1991, he posts the following on an internet discussion list. Excuse my Finnish accent.
PAUL RAMSEY (FINNISH ACCENT): "Hello, everyone out there using Minix."
PAUL RAMSEY: "I'm doing a free operating system-- just a hobby, won't be big and professional like GNU-- for 386 AT clones. This has been brewing since April and is starting to get ready. I'd like any feedback on things people like or dislike in Minix, as my OS resembles it somewhat-- same physical layout of the file system-- due to practical reasons-- among other things.
I've currently ported Bash and GCC. Things seem to work. This implies I'll get something useful within a few months, and I'd like to know what features most people would want. Any suggestions are welcome, but I won't promise I'll implement them-- smiley face-- Linux."
Underneath the technical language, note the subtextual bits, the humility, right-- it's just a hobby-- the interest in other people's ideas. What do you like? What do you dislike about Minix? The posting is an invitation, right? Does anyone else want to come out and play? They do. Within 15 minutes, he has a reply-- "Tell us more! Does it need an MMU? How much of it is in C?"
Within 24 hours, he has replies from Finland, Austria, Maryland, England. In a month, the code's on a public FTP server. Within four months, it's so popular that an FAQ document has been written to handle the common questions. Linus Torvalds tapped a seam of enthusiasm just dying to express itself. People who loved computers and computing just wanted to play together. And through the medium of the internet, using only the simplest tools-- diff, patch, FTP, and email-- he built a community of thousands of contributors. And together, they built a usable operating system.
Now, something important changed between the time that Stallman started the GNU project and Torvalds released Linux. The values of collaboration were the same, but the opportunity to exercise those values was far greater via the internet. When Stallman started GNU in 1984, there were a thousand hosts on internet. When Torvalds started Linux in 1991, there were over 400,000. And the pool of potential collaborators was in the midst of a huge expansion.
Permit me one more short digression on the digression to talk about Star Wars. And in particular, let's look at a website called Star Wars Uncut. Star Wars Uncut has taken the original movie and chopped it into 473 15-second scenes.
Each scene is then separately claimed and reenacted by site members. And then they upload their scene, their 15-second scene. And the result looks like this. They've actually finished the project now. So they have the whole 90 minutes in 15-second chunks. And you can watch Star Wars Uncut, the full edit. For Star Wars aficionados, it's the most hilarious 90 minutes you'll ever spend. Because you just can't predict what's going to come next? Because it's an incredible mix.
This is actually my favorite part right here-- oop, upstairs. The costumes are fabulous. The stop-motion ranges from LEGO to plasticine. Millennium Falcon, you know-- really, whatever you can imagine, right?
So, it seems pretty frivolous, right? But break it down, right? How is this frivolous collaboration possible? And why is it only happening now, right? Why didn't it happen 10 years ago? There are just as many Star Wars nerds, maybe more, 10 years ago as there are now.
First, the activity requires easy access to video recording and editing tools. And until recently, video editors were very expensive, and cameras. And it requires enough bandwidth to download and upload video. And until recently, people didn't have that kind of bandwidth in their homes. And finally, it requires Star Wars geeks.
But generalizing then, to build the large, collaborative project, you need tools freely or at least very cheaply available. You need sufficient connectivity between participants. You combine that basic infrastructure with community collaboration and love for the subject matter, and magic happens. There's many, many more examples of this kind of group collaboration. The academics call these collaborations "commons-based peer production."
And open source software in general, and the Linux project in particular, are some of the earliest examples of internet-mediated, commons-based peer production. You may have heard of Wikipedia and OpenStreetMap, right-- millions of people collaborating over the internet to build rich, valuable collections of knowledge. This is not a fad. This is the new normal, OK, complex structures of knowledge built by distributed communities using free tools, held together by a shared interest, an emotional interest or a financial interest, in the product being created.
The proof's already there, right, in our software, our encyclopedia, in our maps. The development of Linux fits the commons-based peer production pattern. The free access to tools was provided by the GNU project components. The medium of communication was just email. The work they were sending around was source code, snippets of text. And why do open source programmers do it? What's the core motivation? It isn't money. Fundamentally, they code because they love it.
It's the same reason Star Wars geeks reshoot 35-year-old films, why food geeks post restaurant reviews, why car geeks rebuild '68 Camaros, OK? It's an avocation. At least it starts that way. But open source software has a wider utility then restaurant reviews and vintage muscle cars. So as projects have expanded, they have, at each stage, become more and more integrated into the wider community.
Linux is a good example. Start with Linus, the early group of enthusiasts in 1991. These are individuals. They're working in their spare time. They're doing it for love. By 1992, you get distributors. They're packaging up Linux kernel with collections of GNU and other tools to form full working operating systems. First, they do it for love, helping other Linux lovers. But soon they're covering their cost and time selling CD-ROMs for $50, $40. So programmers are earning livings with small Linux businesses within a couple of years of the project's start.
1994, Digital Equipment Corporation sends Linus a free Alpha workstation in the hopes that he'll port Linux to the Alpha chip. He does. Simultaneously, David Miller ports Linux to the Sun processor. So Linux is now competing with "real" Unix on corporate big iron. Over the next couple of years, the makers of these machines, they started to hire Linux programmers of their own.
In 1995, Red Hat Linux is formed. That's a company which will eventually grow to an $8 billion Linux support enterprise. 1996, Los Alamos National Labs builds the first Linux cluster for simulating atomic shockwaves. By 1998, the explosion of the internet into general public use is underpinned by thousands of commodity servers running Linux as their operating system. Microsoft is drafting strategy memos about how to counter Linux. And Linus Torvalds is featured on the front page of Forbes magazine, OK?
Linux is no longer a hobbyist activity. It's deeply embedded in the economy at multiple levels. This is in 1998, just seven years after that first newsgroup post.
Fast-forward to the present. The NSA employs Linux programmers to make their systems secure. NASA employees Linux programmers to run it on space mission hardware. Google employs Linux programmers to optimize their massive compute clusters. Oracle employs Linux programmers to support Oracle Optimized Linux. IBM employs Linux programmers to make sure it runs on systems and mainframes. Microsoft employs Linux programmers to add kernel support for Windows virtualization, and so on, and so on, and so on.
So, here's the question I get asked a lot-- how do you make a living writing free software? Referring back to the previous slide, hopefully it would be obvious. I make my living the same way my dentist, my barber, and my plumber make their livings. I sell my very specialized expert services in open source spatial database programming to people who need those services. And in a globalized, internet-connected world, there are plenty of people who need them.
So, I could talk to you for another half hour about different ways open source projects are deriving support from the general economy. But unfortunately, then, I wouldn't have enough time to talk about you-- yes, you, right? Should you use open source? Here are five good managerial reasons to consider open source for your enterprise-- cloud readiness-- also known as scaling, also known as rapid deployment-- license liability-- or actually, lack of same-- flexibility and its kissing cousin, heterogeneity, staff development and recruitment, and most importantly, market power.
So first of all, technical superiority-- did I forget to mention this one? There are open source advocates who will claim, straight up, no [INAUDIBLE] that open source software is just technically superior to proprietary software. They will say that the open development model results in code with fewer bugs per thousand lines, higher levels of modularity, better security due to wider peer review, faster release cycles, better performance. I think that's actually often true. But it's unfair to present the list with also adding that, in general, open source projects have a narrower base of features. Though larger projects like Linux, or Postgres, or Hadoop, or Eclipse are often fully competitive on features too.
Like Linux, for example, they've concentrated an incredibly large number of very high-quality technical contributors into one code base. There's more people than any one company could ever employ. But many open source projects, and certainly those in the geospatial realm, operate with, at most, a few dozen contributors. They aren't out of the league of corporate development teams, although appearances can be deceiving. And big companies keep their development processes secret.
But sometimes the veil falls off, and it did recently. I was told the number of developers working on SQL Server Spacial was actually fewer than the number working on PostGIS-- bit of a surprise. If you're interested in the topic of technical superiority, David Wheeler has a 2007 paper, "Why Open Source Software? Look at the Numbers." It brings together all the research into one very, very, very long page. It's well worth reading.
So moving on, number 1, reason number 1, cloud readiness-- also known as scaling, also known as rapid deployment-- and it looks like I'm squeezing three topics into one, but I'm not. These three benefits are all aspects of the same open source attribute, the $0 capital cost of deployment. Always on the trailing edge of the leading edge, Microsoft has been advertising, to the cloud. More and more computing tasks are going to be delegated to cloud computers hosted in big data centers somewhere on the internet. And more users will expect direct access to data through web services, which means more mobile devices are going to consume those services with every passing year.
And all that new user demand adds up to potentially unconstrained load on services and growth curves that quickly transition from horizontal to vertical. Scaling services is important, right? And it's getting even more important. Now let me take a quick detour.
I'm a principle developer of the PostGIS open source spatial database. And one of the things I've noticed over the years about PostGIS is the most enthusiastic adopters have been startup companies. Startups love it. GlobeXplorer based their satellite image service on PostGIS. They chose it over Oracle Informix. Redfin started their real estate information site in MySQL and moved it to PostGIS for performance reasons. MapMyFitness-- local company-- they developed their mobile fitness mapping application on top of PostGIS.
And the Google Analytics for the PostGIS website show that California is the state with the highest interest. And inside California, it's San Francisco and the Silicon Valley that have the highest interest. The reason startups love open source software is because it removes a critical constraint to their growth.
So the cost of computing hardware falls dramatically year after year. The cost of proprietary software does not follow the same curve. So if you're using software license per CPU or core, it means the primary driver of scaling cost is software cost. So the math can be brutal even before you start scaling horizontally.
So, this Dell T710-- dual quad-core CPUs, 36 gigabytes of memory, 2 terabytes of RAID 10 storage-- will set you back $6,953. OK, now you've got a lovely server. Let's put Oracle Enterprise on our fancy new server. We got 8 cores. Multiply that by a 0.5 "processor core factor" times the per-processor price of Oracle Enterprise, add in the Spatial because we are GIS guys-- and remember, you need Enterprise to run Spatial-- and the grand total is a cool $260,000, or as Larry Ellison calls it, a quarter-- of a million dollars.
Just contemplate the numbers for a moment-- hmm, hmm. The exact same unpleasant math applies to GIS map serving, right? And it gets worse and worse the more you scale up. Now let's compare scaling in open source in our proprietary map servers.
At initial rollout, the load is small, so we buy one server for $5,000 and one copy of the software for $30,000. And to be fair, or perhaps unfair, let's assume that staff is already fully trained in the proprietary software but requires an immense amount of expensive training or learning time to adopt the open source. So there's our total for the first server, $35,000. Great.
Now, great news-- the citizens love the new map service. Maybe someone built a cool iPhone app around it. And suddenly, the load on the machine quadruples. What does it cost? Add three more servers. Add three more licenses. You don't need more training. The software is the same. And the more you scale, the worse the totals on the left become.
Now it's possible you're already so highly evolved that you run your public services in the cloud. So there are no capital costs for servers. But the math in the cloud remains just as unpleasant. Per instance, proprietary software licensing dwarfs the per-instance hardware cost. And the only difference is the hardware costs in the cloud are more spread out more evenly over time instead of being concentrated in big, capital-intensive bursts.
The final reason startups love open source is they don't have to ask permission to fire up those new services. So they can respond to crises and opportunities very, very quickly. Any software that requires a license or a license manager can potentially slow a deployment for days or weeks. And if you're suddenly enjoying a surge in traffic, the last thing you want to offer your new customers is a slow customer experience.
Number 2, license liability-- lack of. So before I was a programmer, I was a manager. Figure out that career arc. And I used to run a consulting company. And at our peak, we had 30 staff. It was a small company. But still, that meant 30 workstations under 30 desks running 30 licensed copies of Windows and 30 licensed copies of Microsoft Office. At least that was the theory.
In practice, the company had grown very quickly over two years. And software, particularly application software, had been installed wherever it was needed whenever it was needed. So when we finally got around to counting up the difference between what were using and what we owned, it was a bit shocking. We had 10 licenses of Windows, five for Office. Coming into compliance would have cost almost $20,000. Not coming into compliance would risk hundreds of thousands of dollars in fines. We were one disgruntled employee away from a big cash crunch.
And so we examined what we wanted our software to do. Our developers needed Java development environments. Our BAs needed document processing. Our managers needed some word processing. Everyone needed email. So we switched to open source. Everyone got Open Office for word processing. Email and web browsing was with Firefox and Thunderbird. Some developers switched to Linux as their operating system. And we bought a few extra Windows licenses to come into compliance for the rest. It was all surprisingly easy.
Now two things to keep in mind-- first, if we'd been more disciplined in the first place about using open source, we wouldn't have built up the license liability we did. On the server side, we'd always been a Linux and open source shop, so we never built up a problem there. Second, once we got the open source discipline, our potential future liability problems were reduced. There were just a lot fewer licenses remaining to keep track of.
So, we replaced our office automation side without much trouble. What about the GIS side, right? Some tools were just too specialized and ingrained in our workflow to be replaced. So we just worked to manage our license load. We put FME licenses on a shared system with remote desktop. Other things had just gotten out of hand. ArcView 3 is just really, really easy to copy, isn't it? How many of you still have a legal copies of ArcView 3 floating around your offices or your homes, right? If I listen very carefully, I can hear a license compliance manager's teeth grinding somewhere in the back, right? it's OK. Jack's already rich.
Our story ended with removing all the unlicensed ArcView copies and using QGIS instead. OK, here's a few screenshots of QGIS. It looks eerily familiar, right? Doesn't it? Same UI, it's got a basic scripting language, some simple printing capability. It fills a need.
But the core point here is not that proprietary software is replaceable-- though it often is-- it's that proprietary software adds a layer of legal liability that needs to be managed. And that takes time and effort because software gets copied a lot. Like, why wouldn't it? You can make a perfect copy with two keystrokes-- Control-C, Control-V, Control-C, Control-V. And if the software is proprietary, each of those keystrokes digs a compliance hole for your organization-- click, click, click-- like deeper, and deeper, and deeper. And you don't realize how deep that compliance hole is until you fall into it.
Number 3-- flexibility and heterogeneity-- now this is a bit of a geeky argument, OK? But bear with me. First, flexible components are easy to connect to each other and to adapt. You can use flexible components from multiple vendors to build a heterogeneous system. A heterogeneous system incorporates components from multiple sources.
Now flexibility is great, but usually, you have to trade some ease of use to get it. Which toolbox would you rather work with, the hex tool-- it's convenient and easy, fits in the palm of your hand, three sizes-- or the socket set-- modular, extensible, 64 sizes, metric and imperial? One's easy. One's flexible. So here's a practical example of the values of flexibility and heterogeneity. The British Columbia government built their web mapping infrastructure using ArcIMS for internal web servers and the web applications, and using Map Server for external WMS services.
Both web mapping services' servers pull their data from the central ArcSDE instance. So they have a flexible tool in Map Server and a heterogeneous infrastructure using both ArcIMS and Map Server. A few years ago, the infrastructure team applied a minor, minor-- oof-- teeny, weeny sliver of a service patch to the Oracle database that hosted ArcSDE. And to their surprise, the minor patch locked up SDE, and they couldn't restart, which meant that their web services that depended on SDE were also down.
The WMS services were bought back up in three days after a long process of loading raw data into a temporary PostGIS database, because Map Server could read from PostGIS just as easily as from ArcSDE. This was no problem. The ArcIMS services remained offline for the duration of the outage, which was 28 long days, until a patch to ArcSDE was made available.
As a general proposition, proprietary product lines talk well to other systems from the same vendor and less well to systems from other vendors. Competitive advantage dictates this arrangement, right? But it puts the interests of the customer, in interoperability, below the interests of the vendor in promoting lock-in. As a general proposition, open source products talk well to all other systems.
The reason why is less obvious, right? But it has to do with the practical motivations of the developers. Once a project moves past the for-fun stage, the developers are working on it because it is a workplace tool. They need it to do something. And the something they need it to do is usually within the context of other software.
So as a developer, if you like Map Server's GML support, but you work in an environment where most of the data resides in ArcSDE, a reasonable thing to do is to write code to connect the two. That's exactly how Map Server got SDE support. A guy in Iowa needed to talk to SDE, and he added the support. One of these practi-- one of-- each of these practical interconnections increases the overall value of the product, bringing in more developers who bring in their own unique interconnection requirements.
Indulge me in a short digression. This is the boreal forest around Prince George, British Columbia, where I grew up. In the mature forest outside of the creek valleys, over 80% of the trees are pine and spruce. In the late '90s, an infestation of mountain pine beetle began in Wells Gray Park in the northwestern corner of British Columbia. The local infestation turned into an epidemic over the next few years. The epidemic has been uncontrolled for a decade now. And it's only forecast to abate by the middle of this decade when the population of mature lodgepole pine has been completely digested.
Here's a graph of the number of hectares affected over time. The pine beetle has been so successful, for lack of a better word, not just because climate change has reduced the number of cold winters that kill beetles, but also because of its good luck in finding a huge, homogeneous area of mature boreal forest ready to consume. This is the product of 50 years of successful forest firefighting-- just a little digression on the digression.
Computer worms are pieces of code that self-replicate, kind of like beetles. They start from a host, they scan for other vulnerable hosts, and then they copy their children to the new host, where the process continues. This is the infection timeline for the Code Red worm, which, in 2001, spread through a vulnerability the Microsoft IIS web server. Familiar, yes? OK, let's get back on the road.
Homogeneous systems and single-vendor strategies are usually convenient, but there is a trade-off. They lack flexibility, which can make it hard to adapt them for unexpected purposes, and they represent reliability risk and increased vulnerability to population catastrophes, issues that are capable of shutting down your whole infrastructure in one go.
Number 4-- staff development and recruitment-- so one of the most gratifying things I've heard over my career of teaching about open source software is this-- that talk you gave last year totally changed my life. They're saying this about a software talk. This is totally absurd thing to hear about a software talk. And yet, I've actually been told this several times.
The people saying it are technology staff in GIS departments. And the reason they say it is because adopt the open source gave them a whole new toolbox to solve problems. The exhilaration of knowing what's in that box and the freedom to use that knowledge to make cool things without external constraints like licenses on what they could make was deeply empowering for them. These are very special people. They're the kind of people you want to hire.
I recently came across a diagram which explains it all in one page. Take the personality traits of intelligence, obsession, social ineptitude. People with intelligence and obsession are geeks. Inept, smart people are dweebs. The inept obsessives are dorks. And those with all three traits in the middle are the nerds.
So as GIS managers, building out new systems, pushing the envelope, you probably want smart folks with a mapping technology obsession, geo-geeks ideally. But you can settle for geo-nerds. So how do you get these geeks and nerds working for you? Offer something interesting. Remember, these are technology obsessives. So permit me a short digression to talk about LISP.
Paul Graham is a Silicon Valley entrepreneur and a major-league nerd who tells this story about building an e-commerce site, an e-commerce engine that he eventually sold to Yahoo in the late '90s. For personal, technical reasons, they chose to write their engine in LISP, which was a rare choice, right, since most mainstream use of LISP had disappeared by the late '90s. But using LISP had this odd side effect, which was, when they advertised job openings, they got these amazing resumes, rock star candidates. And when they interviewed them, they all mentioned their interest in LISP.
Now, by the '90s, LISP was mostly used in academic settings. But it retained a prominent role as a customization language in-- wait for it-- Emacs, Richard Stallman's text editor for super-programmers. So, the super-nerd programmers who obsessed over Emacs LISP macros were intrigued by the chance to do web development in LISP.
The city of Northglenn, Colorado, they wrote a report about their experience with open source. And they cited some of the motivations I already talked about. But in the section "unobvious motivations for adoption," there is this quote-- "Contrast an open-source implementation position with a 'defined skill set' job where the first diagnostic action is to reboot the server, and the second is to call the vendor and wait in a telephone hold queue. It's easy to understand why open-source jobs are prized."
Finally, market power-- so I chose not to give a deeply technical talk today. So I haven't really run through the panoply of open source GIS software that's available to you. But let me do that quickly, just for effect, OK? So for databases, you have PostGIS, MySQL, Ingres, and SpatiaLite. For map and feature servers, you've got GeoServer, Map Server, Mapnik, TinyOWS, SharpMap, and others. For tile caching, you've got TileCache, GeoWebCache, Tile Stache, and others. For web interfaces, you've got OpenLayers OpenScales, GeoExt, Polymaps, and others. On the desktop, you've got gvSIG, uDig, QGIS, Open Jump, MapWindow, and others. And underneath it all are libraries like GEOS, and GDAL, and OGR, PROJ.4, JTS, GeoTools, which can be leveraged with a scripting language like Python, Perl, Ruby, Groovy, ASP.NET, and others-- whoo, right? It sounds complex, right?
So, I give this talk, which, five years ago-- to conference, five years ago-- it was a quick, 20-minute talk. And it's now metastasized into this 90-minute marathon where I cover all these options in detail. And afterwards, exhausted people, they come up to me. They say, ugh, open source offers too many choices. It's easier with just one vendor-- which is odd, right? Because we deal with lots of choice in all the other markets we navigate every day. There's lots of kinds of cars. There's lots of kinds of blue jeans, lots of kinds of coffee.
And we've got a good idea what a market with just one vendor looks like, right? We actually have laws against it. Proprietary software has a dirty little secret. And it is a secret that lives in plain sight. Even in otherwise competitive markets, the effect of proprietary license is to-- licensing-- is to create an instant de facto monopoly.
How many companies provide support for your proprietary software? One. How many companies provide upgrades? One. How many companies provide enhancements? One. And proprietary companies guard their aftermarket monopoly zealously. Like, in 2010, Oracle sued SAP for providing service patches and support for Oracle to SAP customers. And there's a good reason Oracle sued. The profit margin on Oracle Support is 800%.
It's all about market power. Open source vests the market power in the software user, not the vendor. As a manager, you probably don't care about tinkering with the internals of your software source code. But you should care about holding on to your market power as a customer.
Bob Young, the founder of Red Hat Linux, asks this question of customers-- would you buy a car with its hood welded shut? No, right? So ask the follow-up question. What do you know about modern internal combustion engines? And the answer for most of us is not much. We demand the ability to open the hood of our cars because it gives us, the customer, control over the product we have bought, and it takes it away from the vendor.
We can take the car back to the dealer. If he does a good job, he doesn't overcharge us, he adds the features we want, well, we might keep taking it back to that dealer. But if he overcharges us, if he won't fix the problem we're having, or he refuses to install that musical horn we always wanted, well, there's 10,000 other car repair companies that would be happy to have our business. Making an enterprise commitment to a single vendor puts you permanently into the worst negotiating position possible. You go into every negotiation with no alternative position, no other store to storm off to. The only leverage you have left is the threat to buy nothing at all, right, which isn't much of a threat.
Speaking of market power, does anyone see the resemblance between these images-- huh, huh, huh, huh, huh, huh. I can't bring that segue back-- not a good detour, perhaps. So to maintain market power, provide our staff with [INAUDIBLE] opportunity to build heterogeneous systems, to lower license liability, to be able to scale, for all these reasons, it makes sense to have open source as an option. But how to start? We're going to need a Simpson's hand for this one.
OK, so to get started, you should first experiment with open source on pilot projects. Second, integrate open source into your existing operations slowly and incrementally. Evaluate the open source alternatives for yourself. Don't take other people's word for capabilities. And expand the capabilities of open source to do what you need done. So, we're running of time. I'll take these first two at once, experiment and integrate.
Way back in 2007, Pierce County, Washington State was a 100% ESRI and Microsoft shop with a limited budget for new software acquisition. It did have some talented technical staff and a GIS manager, Linda Gerull, who was interested in new ideas. In the fall of that year, she learned that the International Open Source GIS Conference was being held just to the north in Victoria. And she took the opportunity to send several of their staff. Keep an eye out for alternatives, she told them.
When they came back, they had lots and lots of alternatives. They were very excited. But they couldn't just tear down their infrastructure and start again, right? They had to maintain service continuity. So, the team started experimenting by duplicating some of the existing services that were built using old MapObjects technology and that were slated to be replaced anyways.
Some of them were just really simple services with very minimal user interface like this critical areas query form. It just takes in a partial number and address, and it returns a simple report on environmental factors based on a query of 18 layers. MapObjects was unstable. ArcIMS was too slow. But open source-- PostGIS in this case-- was just right. The form didn't change at all, just the back end.
And as their confidence and the tools grew, they looked at migrating core bits of their infrastructure. So more recently, they replaced their SQL Server database with PostGIS and PostgreSQL. And the key here is that they're continuing to run ArcSDE on top. And this allows them to use their existing data management tools like ArcGIS, but to use a pure open source web services stack directly against PostGIS.
So the changes are incremental, and they're exploratory. Pierce County still runs ESRI software. They still have ArcGIS desktops. But the number of options they have for deploying new systems is much higher. And the number of licenses they require is going down, not up. So their budget flexibility is increasing. At the same time, the staff has enjoyed learning the new technology.
And this is the conclusion slide from a presentation that Pierce County's Jared Erickson gave at the Washington State GIS Conference. Note the conclusions. Open source and ESRI can work together. Open source provides a diverse range of options. Now Pierce County experimented with a limited number of open source components. They experimented with PostGIS and PostgreSQL, GeoServer and OpenLayers. But as we saw earlier, there's lots of choices in every product category.
So, how do you choose the product that's right for you? What are the criteria of evaluation? David Wheeler, who I mentioned previously in the context of open source versus proprietary, also has a very complete document on how to evaluate open source software. And much of the evaluation is the same as with proprietary COTS, Commercial Off-The-Sort, software. And you look at whether the software does what you want and whether it works with the software you already have. But some key things are different.
In evaluating longevity, for instance, you don't necessarily look at the revenues or the customer count of a company, right? That's not there. But you look, instead, at the history and at the activity of the community around the software. And most importantly, you have to do the evaluations yourself. There won't be sales reps. There's not going to be snappy PowerPoint presentations, except for this one. There's not going be sales material or brochures. You're probably going to have to download, install, and test the software yourself.
But this is a good thing, right? This is a feature, not a bug. Trying the software yourself will give you a reality-based understanding of its features, capabilities, and weaknesses, rather than a marketing-based understanding, OK? I always get taken in by the glossy brochure.
But suppose you evaluate, and a software that meets most of your criteria is missing a key feature. What then, OK? Open source often has a lower feature count than proprietary software. But unlike proprietary, adding features that you need is not impossible. You might want special-purpose features for your project, like this list of really obscure functionality. A vendor would never add these because too few customers want them. These are all examples of features that I added on small services contracts to customers from Australia, the USA, Spain, Germany. None of these cost more than $4,000. All of them were added to the community source code, and they're still present in releases today.
In contrast, if you want a feature added to Oracle Spatial, you have to become the Department of Defense. And it would help to lean on Larry Ellison while you're yachting, right? Come on, Larry, add it. If you want a feature added to PostGIS, you just give me $2,000 to $10,000 and wait two to six weeks. It's a big difference in process.
This is a new way of thinking about enterprise software, right-- not just taking it as is, but pushing an open source project in the direction you need it to go. And it's hard to adopt a new way of thinking about software. But it's part of getting started in open source. You've got to experiment. You've got to integrate with your existing software. You've got to evaluate it yourself with your hands on. And you've got to expand the software the way you want to go and expand your thinking about how you approach using software.
The benefits will be organizations that can do-- that have greater scalability, lower license liability, greater flexibility, more staff empowerment, and more market power for the organization. Because open source isn't going away, right? It's not a fad, it's the new normal. Sharing code makes sense for individuals. It makes sense for organizations. The only people it doesn't make sense for is the existing legacy software vendors, which is too bad for them, right, because open source culture, it's just business as usual on the internet-- collections of people and organizations with shared interests, financial and social, using open tools, open licensing to build the software and the knowledge they need.
My name is Paul Ramsey, and I work for OpenGeo, the open source geospatial company. I'll be here for the rest of the event. I look forward to talking with you all. Thanks for listening to me today.