Building the data desk: lessons from the L.A. Times

In early 2007, when the Los Angeles Times launched its Homicide Report blog — an effort to chronicle every homicide in Los Angeles County — it was clear that there were important geographic and demographic dimensions to the information that a blog format wouldn’t fully capture. What we needed was a ChicagoCrime.org-style map that would let users focus on areas of interest to them, with filters that would enable them to “play” with the data and explore trends and patterns for themselves. Problem was, the web staff (of which I was a part) lacked the tools and the expertise to build such a thing, so the blog launched without a map. (Sound familar?)

It took several months to secure the tech resources and a couple more months to create wireframes and spec out requirements for what would become the Homicide Map, with the help of a couple of talented developers and a project manager on part-time loan from the website’s IT department. We were fortunate, of course: We actually had access to this kind of expertise, and since then we’ve hired a couple of dedicated editorial developers. I’m aware that others might not have it so good.

Last week, Robert Niles argued that news organizations should be in the business of creating “killer apps”. Put another way, there is a need to develop tools that hew to the content rather than the other way around. But creating the functionality Robert describes takes a closer connection between news thinking and tech thinking than is possible within news organizations’ traditional structures and skill sets.

In this post, I’ll try to squeeze some wisdom out of the lessons we learned in the process of assembling the Times’ Data Desk, a cross-functional team of journalists responsible for collecting, analyzing and presenting data online and in print. (Note: I left the Times earlier this month to work on some independent projects. I am writing this piece with the blessing of my former bosses there.)

Here, then, are 10 pieces of advice for those of you building or looking to build a data team in your newsroom:

  1. Find the believers: You’ll likely discover enthusiasts and experts in places you didn’t expect. In our case, teaming up with the Times’ computer-assisted reporting staff, led by Doug Smith, was a no-brainer. Doug was publishing data to the web before the website had anybody devoted to interactive projects. But besides Doug’s group, we found eager partners on the paper’s graphics staff, where, for example, GIS expert Tom Lauder had already been playing with Flash and web-based mapping tools for a while. A number of reporters were collecting data for their stories and wondering what else could be done with it. We also found people on the tech side with a good news sense who intuitively understood what we were trying to do.
  2. Get buy-in from above: For small projects, you might be able to collaborate informally with your fellow believers, but for big initiatives, you need the commitment of top editors who control the newsroom departments whose resources you’ll draw on. At the Times, a series of meetings among senior editors to chart a strategic vision for the paper gave us an opportunity to float the data desk idea. This led to plans to devote some reporting resources to gathering data and to move members of the data team into a shared space near the editorial library (see #8).
  3. Set some priorities: Your group may come from a variety of departments, but if their priorities are in alignment, disparate reporting structures might not be such a big issue. We engaged in “priority alignment” by inviting stakeholders from all the relevant departments (and their bosses) to a series of meetings with the goal of drafting a data strategy memo and setting some project priorities. (We arrived at these projects democratically by taping a big list on the wall and letting people vote by checkmark; ideas with the most checks made the cut.) Priorities will change, of course, but having some concrete goals to guide you will help.
  4. Go off the reservation: No matter how good your IT department is, their priorities are unlikely to be in sync with yours. They’re thinking big-picture product roadmaps with lots of moving pieces. Good luck fitting your database of dog names (oh yes, we did one of those) into their pipeline. Early on, database producer Ben Welsh set up a Django box at projects.latimes.com, where many of the Times’ interactive projects live. There are other great solutions besides Django, including Ruby on Rails (the framework that powers the Times’ articles and topics pages and many of the great data projects produced by The New York Times) and PHP (an inline scripting language so simple even I managed to learn it). Some people (including the L.A. Times, occasionally) are using Caspio to create and host data apps, sans programming. I am not a fan, for reasons Derek Willis sums up much better than I could, but if you have no other options, it’s better than sitting on your hands.
  5. Templatize: Don’t build it unless you can reuse it. The goal of all this is to be able to roll out projects rapidly (see #6), so you need templates, code snippets, Flash components, widgets, etc., that you can get at, customize and turn around quickly. Interactive graphics producer Sean Connelley was able to use the same county-level California map umpteen times as the basis for various election visualizations in Flash.
  6. Do breaking news: Your priority list may be full of long-term projects like school profiles and test scores, but often it’s the quick-turnaround stuff that has the biggest immediate effect. This is where a close relationship with your newsgathering staff is crucial. At the Times, assistant metro editor Megan Garvey has been overseeing the metro staff’s contributions to data projects for a few months now. When a Metrolink commuter train collided with a freight train on Sept. 12, Megan began mobilizing reporters to collect key information on the victims while Ben adapted an earlier Django project (templatizing in action!) to create a database of fatalities, complete with reader comments. Metro staffers updated the database via Django’s easy-to-use admin interface. (We’ve also used Google Spreadsheets for drama-free collaborative data entry.) … Update 11/29/2008: I was remiss in not pointing out Ben’s earlier post on this topic.
  7. Develop new skills: Disclaimer: I know neither Django nor Flash, so I’m kind of a hypocrite here. I’m a lucky hypocrite, though, because I got to work with guys who dream in ActionScript and Python. If you don’t have access to a Sean or a Ben — and I realize few newsrooms have the budget to hire tech gurus right now — then train and nurture your enthusiasts. IRE runs occasional Django boot camps, and there are a number of good online tutorials, including Jeff Croft’s explanation of Django for non-programmers. Here’s a nice primer on data visualization with Flash.
  8. Cohabitate (but marriage is optional): This may be less of an issue in smaller newsrooms, but in large organizations, collaboration can suffer when teams are split among several floors (or cities). The constituent parts of the Times’ Data Desk — print and web graphics, the computer-assisted reporting team and the interactive projects team — have only been in the same place for a couple months, but the benefits to innovation and efficiency are already clear. For one thing, being in brainstorming distance of all the people you might want to bounce ideas off of is ideal, especially in breaking news situations. Also, once we had everybody in the same place, our onetime goal of unifying the reporting structure became less important. The interactive folks still report to latimes.com managing editor Daniel Gaines, and the computer-assisted reporting people continue to report to metro editor David Lauter. The graphics folks still report to their respective bosses. Yes, there are the occasional communication breakdowns and mixed messages. But there is broad agreement on the major priorities and regular conversation on needs and goals.
  9. Integrate: Don’t let your projects dangle out there with a big ugly search box as their only point of entry. Weave them into the fabric of your site. We were inspired by the efforts of a number of newspapers — in particular the Indianapolis Star and its Gannett siblings — to make data projects a central goal of their newsgathering operations. But we wanted to do more than publish data for data’s sake. We wanted it to have context and depth, and we didn’t want to relegate data projects to a “Data Central“-type page, something Matt Waite (of Politifact fame) memorably dubbed the “data ghetto.” (I would link to Waite’s thoughtful post, but his site unfortunately reports that it “took a dirt nap recently.” Update: It’s back, and here’s the post.) I should note that the Times recently did fashion a data projects index of its own, but only as a secondary way in. The most important routes into data projects are still through related Times content and search engines.
  10. Give back: Understand that database and visualization projects demand substantial resources at a time when they’re in very short supply. Not everyone in your newsroom will see the benefit. Make clear the value your work brings to the organization by looking for ways to pipe the best parts (interesting slices of data, say, or novel visualizations) into your print or broadcast product. For example, some of the election visualizations the data team produced were adapted for print use, and another was used on the air by a partner TV station.

When I shared this post with Meredith Artley, latimes.com’s executive editor and my former boss, she pointed to the formation about a year ago of the interactive projects team within the web staff (Ben, Sean and me; Meredith dubbed us the “cool kids,” a name that stuck):

“For me, the big step was creating the cool kids team — actually forming a unit with a mandate to experiment and collaborate with everyone in the building with the sole intention of creating innovative, interactive projects.”

And maybe that should have been my first piece of advice: Before you can build a data team, you need one or more techie-journalists dedicated full-time to executing online the great ideas they’ll dream up.

What else did I miss? If you’ve been through this process (or are going through it, or are about to), I hope you’ll take a minute to share your insights.

L.A. Times launches sharable electoral vote map

Which campaign will get to 270 in November, and how will they do it? The L.A. Times has built an interactive map that allows readers to create and test their own electoral vote scenarios, and then embed those scenarios in their own sites.

Sample electoral vote scenario: (not my prediction; just an uneducated guess for demonstration purposes only)

This is the creation of Sean Connelley, our Flash guru, based on our 2004 electoral vote tracker. The cool addition this time around is the sharing functionality.

We’re hoping to improve on this as the campaign heats up, perhaps adding demographic info and data on past elections by state. Would love to hear suggestions.

LATimes.com launches online database of California's war dead

Thought I’d share with OJR readers a project I’ve been working on: Last week the Los Angeles Times launched a database of California’s military dead in the wars in Iraq and Afghanistan. This story does a nice job of introducing the database:

Across the nation, more than 4,600 have died while in service to the country. Of the California dead, the median age was 23. Their deaths left 205 widows and three widowers, and 300 children who will grow up without their fathers, two without their mothers. Thirty-eight of the 492 were engaged.

About 67% were in the Army, Army National Guard or Army Reserve; 27% in the Marine Corps or Marine Corps Reserve. The Air Force accounted for 2%, the Navy and Navy Reserve for 4%. Two percent of those killed were women.

At least 59 were immigrants.