The programmer as journalist: a Q&A with Adrian Holovaty

[The universe of journalists who program is, well, pretty small. Which is why I welcome the chance to talk with Adrian Holovaty, an award-winning journalist/programmer whose work, both for WashingtonPost.com and for his own sites, expands this profession’s capabilities. Adrian graciously agreed to answer a few of my questions via e-mail for OJR. — Robert]

OJR: I think one can safely assume that everyone in the news business understands how one “does journalism” through writing or photography. But how does one “do journalism” through computer programming?

Holovaty: The way I see it, there are three basic tasks that journalists do:

1. Gathering information. This involves talking to sources, examining documents, taking photographs, etc. It’s reporting.

2. Distilling information. This involves applying editorial judgment to decide what parts of the gathered information are important and relevant.

3. Presenting information. This involves shaping the distilled information into a format that is accessible to the readership. Some examples: writing style (inverted pyramid, etc.), photo color-correction, newspaper page design.

“Doing journalism through computer programming” is just a different way of accomplishing these goals. Namely, the technique favors automation wherever possible.

For example, it’s possible to automate that first step, the gathering of information. That’s how my chicagocrime.org site works. Each weekday, my computer program goes to the Chicago Police Department’s website and gathers all crimes reported in Chicago. Similarly, the U.S. Congress votes database I helped put together at washingtonpost.com works the same way: Several times a day, an automated program checks several government websites for roll-call votes. If it finds any, it gathers the data and saves it into a database.

The second step, distilling information, can also be automated. Just as an editor can apply editorial judgment to decide which facts in a news story are most important, a programmer-journalist (we really do need a better name than that!) decides which *queries* should be made of data. For instance, on chicagocrime.org I decided it would be useful if site users could browse by crime type, ZIP code and city ward. On the votes database site, we decided it would be useful to browse a list of all the votes that happen late at night and a list of members of Congress who’ve missed the most votes. Once we made that decision of which information to display, it was just a matter of writing the programming code that automated it.

In the “journalism through computer programming” realm, the third step, presentation, is also automated. This is particularly complex, because in creating websites, it’s necessary to account for all possible permutations of data. For example, on chicagocrime.org I had to account for missing data: How should the site display crimes whose data has changed? What should happen in the case where a crime’s longitude/latitude coordinates aren’t available? What should happen when a crime’s time is listed as “Not available”?

Also, I should point out that the two example sites I’ve given are entirely automated, but often it’s not possible to automate an entire project. In most cases, information gathering is done by humans rather than computers, and the computer programming comes into play in automating the distillation and display of the data.

A good example of this is washingtonpost.com’s Faces of the Fallen site, which lists all known U.S. service members who have died in Iraq and Afghanistan. That information is collected by the Post’s fantastic newsroom research team, not by automated scripts. The “journalism via computer programming” in this case is in the setup of the website itself: Once our researchers collect and verify information, it gets displayed on the website and is made browsable and searchable by a variety of different parameters such as age, home town and military branch. That — the display — is the part that’s automated.

OJR: What is the value to a journalist in understanding programming, or even learning how to do it?

Holovaty: The main value in understanding programming is the advantage of knowing what’s possible, in terms of both data analysis and data presentation. It helps one think of journalism beyond the plain (and kind of boring) format of the news story.

Programming comes in handy in all sorts of other areas, too, including gathering information. Now that quite a few governments and organizations are publishing data on their own websites, it’s a valuable skill to be able to automate the retrieval of that data and compile it into a format that makes it easy to research and aggregate.

OJR: What should journalism schools be doing to prepare future journalists to work in a mash-up publishing universe?

Holovaty: J-schools need to get way more technical. A graduate of a journalism school should be a master of collecting data — whether the old-fashioned way (by talking to humans) or through automated means.

The closest thing journalism schools currently have (to my knowledge) is computer-assisted reporting classes. Those classes should be required, in my opinion, and even better would be for j-schools to partner with computer-science departments so that journalism students would get some experience coding.

OJR: What types of information are newsrooms collecting right now, but most under-utilizing on their websites?

Holovaty: Much of the information that journalists collect, day to day, is structured. Information such as crime reports, obituaries and event listings always follow a certain pattern, which can be richly exploited by databases.

The majority of newspapers takes the time to *collect* this information — which is the hard part — but they dramatically reduce its value by NOT storing it in structured formats. Instead, they distill it into big blobs of text for publication in their print editions, and then they shovel those big blobs of text onto their websites. At this point, all structure is lost: Crime reports can’t be sorted or searched intelligently, and event listings can’t be viewed in any sort of user-friendly way.

The very act of distilling information into a news story — which is essentially a big blob of text — removes any sort of structure. Information is exponentially more valuable if it’s structured.

So I urge news companies to retain as much structure in their information as possible. These days, it’s easier and cheaper than ever to set up a database server. Just do it.

A few specific examples? Any sorts of public records are structured, really. Crime reports are an obvious one. Fire-station reports, local school data, transportation data. There’s a ton of this stuff.

Beyond the obvious examples, journalists should step back and consider more abstract concepts in terms of structured information. For example, just a couple of weeks ago at washingtonpost.com we databased the “key races” across the country in the 2006 elections, as determined by our editors: http://projects.washingtonpost.com/elections/keyraces/ . Each race has a name, a state/district, a number of candidates — it’s very structured, if you think about it that way. And because we’ve databased it, we’ve automated much of the tedium of updating the site, because the site runs itself, grabbing information from our database.

This sort of automation and exploitation of structured information is where I think (and hope) journalism is going.

OJR: What ought news organizations do to encourage tech innovation from their staffs?

Holovaty: Hire programmers! It all starts with the people, really. If you want innovation, hire people who are capable of it. Hire people who know what’s possible.

And once you hire the programmers, give them an environment in which they can be creative. Treat them as bona fide members of the journalism team — not as IT robots who just do what you tell them to do.

OJR: Do you think most news managers are afraid of technology? If so, how do tech-savvy journalists overcome that?

Holovaty: I’ve met both types of managers — those that are scared and those that aren’t. (For the news managers who *are* afraid of technology, you can’t blame ’em. It’s only natural. Technology is completely changing their industry, whose rules haven’t changed drastically in a long time.)

It seems the best way to overcome the fear is to emphasize that technology can be used to further the goals of journalism. It’s reasonable for managers to be afraid of things they don’t understand, but if you boil down the specific technology to the specific journalism problems it solves, I suspect managers would be more understanding.

OJR: What is the most innovative project you’ve worked on? What was so interesting about it?

Holovaty: The projects that are most interesting to me involve reverse-engineering and altering Internet applications to do things they weren’t supposed to do, for the benefit of users. For example, a year ago I tinkered with putting CTA (Chicago Transit Authority) subway maps on Google Maps (http://holovaty.com/blog/archive/2005/04/19/0216/). It no longer works, but it was really cool. Also, I enjoyed creating the “All-Music Guide Fixer” Firefox extension, which, when installed, alters the display and functionality of allmusic.com (http://holovaty.com/blog/archive/2004/07/19/2210). This idea — site-specific user customizations of websites — eventually became the Greasemonkey Firefox extension.

In journalism, I’d have to say the most innovative project I’ve been lucky enough to work on was lawrence.com, the local entertainment site for Lawrence, Kansas. So much automated subtlety is happening behind the scenes of that site. For example, in the event calendar, an event that takes place at a bar will automatically pull out the drink specials for the day of the event. Similarly, if an event features a local band, the system automatically pulls out sound clips and creates an “If you go, you might hear these songs” sidebar. Lawrence.com has a ton of little innovations that go way beyond what most other entertainment sites do, even though the site has had these little innovations for more than three years.

OJR: What interesting projects are you working on now?

Holovaty: I’m heavily involved in the development of Django, an open-source Web framework for the Python programming language. In layperson’s speak, it’s free software that makes Web development fast and easy. We created it when I worked in Lawrence, and we open-sourced it in July 2005. It’s gotten a ton of attention, and people all over the world are using it and improving it. I’m cowriting a book about it at the moment, as well.

Aside from that, I’ve been collecting various public-record data in Chicago in preparation for the launch of my “sequel” to chicagocrime.org. Can’t say much more about this project at the moment, but I’m very excited to launch in the coming weeks!

OJR: Other that the stuff you’re working on, what technology you’ve looked at recently has grabbed your attention?

Holovaty: Generally I get excited by new APIs that various websites are launching. The Flickr APIs are a classic example: They let any programmer query the Flickr photo database via programs.

OJR: Journalism’s always been a competitive business. But what technical initiatives should news organizations be cooperating on? What opportunities, if any, are the industry missing when companies don’t work together?

Holovaty: I think news organizations should cooperate on removing mandatory Web-site registration walls, which are severely reader-unfriendly. It’s embarrassing to be associated with an industry that treats its customers with such disdain.

OJR: What online news projects have you seen recently, if any, that you thought were especially well done? (Not counting the Washington Post and other sites you’ve worked on….)

Holovaty: Off the top of my head —

* Just the other day I saw the great weather/hurricane tracking app at http://www.ibiseye.com.

* I’m consistently impressed by the stuff coming out of mySociety .

* Faneuil Media does some great work.

OJR: What tech sites do you check to keep up with the latest in mash-ups, programming and Web development?

Holovaty: Every day I check delicious popular a couple of times. That’s a good indicator of what people are talking about and the new things happening on the Web.

About Robert Niles

Robert Niles is the former editor of OJR, and no longer associated with the site. You may find him now at http://www.sensibletalk.com.

Comments

  1. This is a very useful interview. What always impresses me about Adrian and other “programmer journalists” is that they don’t seem afraid to learn and are so enthusiastically curious about new tools and ideas.

    These attributes seem to be less evident in lots of “mainstream media” people that I meet. They know it all before you even start talking to them. Everyone in journalism, especially j-education could do well to be abit more willing to explore the area where technology, thinking and communication meet.

  2. For those of us who are journalists, who perhaps have some background in spreadsheets and databases and basic HTML but none in programming, how would you recommend we begin to teach ourselves how to do this stuff?

  3. Lex: Check out the free book “How to Think Like a Computer Scientist: Learning with Python”.

    PDF: http://www.ibiblio.org/obp/thinkCSpy/dist/thinkCSpy.pdf
    Web: http://www.ibiblio.org/obp/thinkCSpy/

    From there, the book “Text Processing with Python” would be a good place to go.

    Good luck!

  4. I’ve found ColdFusion a painless way for non-programmers to code up data-driven, dynamic webpages. It’s a tag-based mark-up system (like HTML), rather than a script-based system, like PHP, making it easier for a total newbie to digest. It’s also integrated with DreamWeaver, a plus for folks used to that tool.

    (Disclosure: OJR’s CMS is written in CF.)

    The downsides are that CF is not open-source, and it’s a bear on a server, so you need to find a host that can support it well. It’s also a Web publishing tool, not a programming language, so you won’t use it for computer-assisted reporting (CAR) analysis or stand-alone applications.

    But I like CF as a way for a person with nothing more than HTML and Access experience to get oneself to the “a ha!” moment where you see how data and presentation intersect online. From there, I think it is easier to comprehend more sophisticated programming languages, but there are people who would argue with me on that. The opposing view is that a tag-based solution like CF is an intellectual dead-end, and one that might get in the way of people understanding the unique syntax of scripting and programming.

  5. Lex, the way that I got into it was that in doing spreadsheet and database stuff I kept running into the same situation – namely, I kept having to do repetitive tasks. So I started looking at scripts that took certain tasks and automated them – downloading web pages, pattern matching – and then tried to apply them to what I was working on.

    That involves a lot of trial and error, but eventually you get better at. And then I started branching out, reading stuff like what Adrian suggested. To me, it was a lot easier to wrap my head around a concrete task and then learn the steps to accomplish it.

  6. Tom Grubisich says:

    This hugely valuable Q & A pulls together the important things that are happening in Web journalism. What’s clear from what Adrian says — in response to Robert’s on-spot questions — is that we’re finally seeing developers/programmers and journalists (card carrying and the grassroots variety) coming together instead of being walled off by what used to be their different, sometimes adversarial cultures. When you go to sites like http://www.calacanis.com, you see this convergence happening — journalists adopting to and affirming technology, and, what’s most encouraging, developers/programmers using blogs to reach an audience well beyond what used to be their hermetically sealed world.

  7. Thanks all for the suggestions. I’ve downloaded a copy of Crimson Editor, and if a reasonable goal is to know enough to be dangerous, then I hope to become dangerous very soon.

    That said, Derek, can you please be a little more specific about how you did this? Are there (stupid-question alert) sites from which you can download free script?

  8. Sure thing, Lex. Here’s an example from my site: I got tired of visiting the FEC’s press release page all the time, so I wanted to roll my own RSS feed from it. I used a Python script originally by Sam Ruby and adapted it to my needs. You can see the development process for the script.

  9. This I’ve gotta try.

    Thanks!

  10. I suggest it’s much more likely that IT folks who are interested in community based news/media centers will become grassroots journalists instead of the other way around. I base this on being involved with many IT projects in the past 40 years where well meaning users wanted to help “program” their applications.

    As an example of the former see http://prattnews.com where most of the founders have an IT background.