Holovaty's EveryBlock unlocks neighborhood news data

Noted journalist/programmer/Web guru Adrian Holovaty just launched his latest project, the Knight News Challenge-funded EveryBlock. As the site’s name implies, it strives to provide information about every block of the three cities it covers: New York, Chicago and San Francisco. Included data might include crime reports, civic inspections and filings, even geotagged Flickr photos.

Too many news professionals get bogged down by traditional notions of “journalism,” that what newsrooms publish must be multi-sourced narrative “stories,” from five to 500 inches long, following J-school-approved norms for reporting and narrative structure.

Feh. When I worked a small daily, many of us on staff suspected that the most popular features in the paper were the obits, police blotter and log of ambulance runs. And you know what? When I moved up, to bigger cities and bigger dailies, I missed not being able to check the paper to see where that police cruiser or ambulance I heard yesterday was going.

Readers love information. Whether that’s a police blotter, local bulletin board, school lunch schedule or gripping story in the local paper — they don’t care about the format. Readers just want it to be accurate, relevant and complete. Without anything misleading or extraneous, either.

That’s why I love watching people like Holovaty, whom I’ve interviewed before on OJR. The public has voted with its mouse clicks that it wants more information from the rest of the world that they are finding from the same, stale stories in their shrinking local papers. Holovaty’s creations offer the promise of a reinvigorated news industry, driven by journalists who can wield code, statistics and data every bit as effectively as words and grammar. I e-mailed Holovaty, and asked him about EveryBlock.

OJR: What’s EveryBlock providing that the average Web reader could not get before?

Holovaty: First, fundamentally, we offer a way to browse news at the block level, with a news page for every block — hence the name EveryBlock. We’ve done a fair amount of due diligence and are pretty confident this hasn’t been done before — and in three of the densest cities in America, at that.

Second, we’re providing some information that didn’t previously exist online. Two examples are film locations in Chicago and restaurant inspections in San Francisco. The former is provided to us by the Chicago Film Office, and the latter is provided to us by the San Francisco Department of Public Health, which has its own website but doesn’t include some of the data we publish.

Third, we make it easy to browse information that already existed online but was buried in deep government sites, either in “deep Web” search forms or non-Web-friendly formats such as PDFs. Two examples are landmark building permits in New York City and crime reports in New York City, but there are many other examples across our three city sites. This has been an interest of mine for a number of years, and it’s a dream come true to have the opportunity to do it at this scale.

Fourth, we’re detecting geography in narratives — “blobs,” so to speak — and making it easy for people to find relevant news articles and government documents that refer to specific places near them. Some examples are New York City news articles, San Francisco zoning agenda items and Chicago city press releases. Another (geeky) way to phrase this is that we’re harvesting geographic metadata from unstructured text.

Fifth, we’re providing some light trending and aggregate reports for *each* type of information on our site. For example, see the Chicago crime data.

OJR: Describe the work that went into creating EveryBlock.

Holovaty: The work that has gone into creating EveryBlock has been quite diverse, which makes the job interesting and exciting. On the “human” side of things, we’ve established many relationships with government officials and other partners who are responsible for local data. On the user-interface side, we’ve worked to design a gorgeous, easy to use site and an architecture that accommodates a wide variety of disparate types of information. On the map side, we’ve made our own maps, deciding against Google’s or Yahoo’s map offerings for a number of reasons; that took a sophisticated combination of design, coding and data chops. At the technical level, we’ve developed an array of technology just to get all of this data into an elegant, unified system. It’s beautiful. And we’ve even done a fair amount of manual labor, from hand-drawing neighborhood boundaries to hand-tagging newspaper articles to train our geoparsing algorithms.

OJR: I suspect that when many people, inside and out of the news industry, hear the word “journalism,” they think of a specific, narrative format for providing information. But sites such as EveryBlock provide information outside the traditional newspaper narrative form. Do you think that people in the news industry need to modify or expand their conception of “journalism” in order to account for the new and different ways that people can access and present information online?

Holovaty: People can define “journalism” however they’d like. At EveryBlock, what we’re interested in exploring is what sort of frequently updated information consumers want at the block level, and how they’d like to receive it. Whether this is called “journalism” or not is strictly academic. (I think it’s hard to argue against calling it “news,” though.)

I think people in the news industry should indeed modify their conception of what information they publish, and how they publish it. But should they modify their conception of “journalism”? Leave that to the people who have the time and inclination to debate semantics.

OJR: What has kept, or is still keeping, newspapers from having functionality like EveryBlock’s on their websites?

Holovaty: Unfortunately, there’s a lot. In the general case (and “general” means this excludes the newspapers out there who are doing great things online) —

* A lack of technical competence
* A culture so obsessed with daily deadlines that little thought/resources are put into paradigm changes
* A culture that disdains technology and science, particularly math, and, worse, actually takes pride in that
* Red tape
* Legacy systems
* Legacy attitudes
* People who ask “Is this journalism?” 😉

OJR: How long can you keep the site running on your Knight funding? What happens after that runs out?

Holovaty: That remains to be seen! Knight has awarded a two-year grant, and we’re just over six months into it, so… ask again in about 18 months. 🙂

OJR: Who are you trying to reach with EveryBlock? How are your promoting the site?

Holovaty: We’re trying to reach residents of the EveryBlock cities. If you live in Chicago, New York or San Francisco, we hope to make your block’s page on EveryBlock into something that you’d find useful time and time again.

Something tells me you won’t be seeing EveryBlock billboards on the expressway, or EveryBlock ads on subway cars. That’s just not our style. We’ve e-mailed friends and family, and the rest has sort of happened through word of mouth, blogs and media coverage. This approach worked well for chicagocrime.org, which has (anecdotally) gained pretty good awareness over the past two and a half years here in Chicago, with zero traditional marketing on my part.

OJR: What’s the timeframe, and procedure, for expanding to other cities?

Holovaty: This is an interesting challenge that we knew going into the project: to some extent, the technology is scalable (i.e., replicable in a way that makes subsequent cities easier to launch), but at the same time, every city’s data is different. We still don’t know how that breaks down, resource-wise, but it’ll be fun to find out.

OJR: If a local publisher wants EveryBlock technology on his/her website, what should they do? Are you working with partners? Should they try to build this themselves?

Holovaty: We’re obligated under the terms of our Knight grant to release the site’s code under an open-source license at the end of the grant period. The idea there is to experiment and do some good for the news industry — that’s one of the core missions of the Knight News Challenge program (www.newschallenge.org), which is the contest I entered to receive this grant.

In the meantime, though, we’re very interested in working with partners — media companies, governments, bloggers and any other local-news publishers — in our EveryBlock cities. (Folks can contact us at feedback at everyblock.com.)

Personally, I wouldn’t recommend that news organizations attempt to build this themselves, but it’s obvious that I’m biased — both by having implemented EveryBlock and having worked at a number of news organizations.

OJR: What information, related to EveryBlock or not, do you really wish you could get in front of Web readers, but that you haven’t figured out either how to get your hands on or how to present effectively? (In other words, what’s the next challenge?) What’s it going to take to get that information out there?

Holovaty: That’s another good question. Regarding data on the current EveryBlock cities, I’d say we’re only at 10 percent of where we could be. We’re almost ready to add a couple of data sets that didn’t make it in time for launch, and we’re continually adding news sources and blogs to crawl.

One type of information that we purposefully haven’t included on EveryBlock is “static” information — the location of the nearest subway station, or the census demographics for your block. There’s been a small amount of user interest in this, but there’s a core difference in that type of information, namely that it’s not time-sensitive, and it would take some thinking to figure out how that fits in with our current “news feed for your block” paradigm. We’ll see what happens. It’s one of the many interesting problems we look forward to tackling.

The programmer as journalist: a Q&A with Adrian Holovaty

[The universe of journalists who program is, well, pretty small. Which is why I welcome the chance to talk with Adrian Holovaty, an award-winning journalist/programmer whose work, both for WashingtonPost.com and for his own sites, expands this profession’s capabilities. Adrian graciously agreed to answer a few of my questions via e-mail for OJR. — Robert]

OJR: I think one can safely assume that everyone in the news business understands how one “does journalism” through writing or photography. But how does one “do journalism” through computer programming?

Holovaty: The way I see it, there are three basic tasks that journalists do:

1. Gathering information. This involves talking to sources, examining documents, taking photographs, etc. It’s reporting.

2. Distilling information. This involves applying editorial judgment to decide what parts of the gathered information are important and relevant.

3. Presenting information. This involves shaping the distilled information into a format that is accessible to the readership. Some examples: writing style (inverted pyramid, etc.), photo color-correction, newspaper page design.

“Doing journalism through computer programming” is just a different way of accomplishing these goals. Namely, the technique favors automation wherever possible.

For example, it’s possible to automate that first step, the gathering of information. That’s how my chicagocrime.org site works. Each weekday, my computer program goes to the Chicago Police Department’s website and gathers all crimes reported in Chicago. Similarly, the U.S. Congress votes database I helped put together at washingtonpost.com works the same way: Several times a day, an automated program checks several government websites for roll-call votes. If it finds any, it gathers the data and saves it into a database.

The second step, distilling information, can also be automated. Just as an editor can apply editorial judgment to decide which facts in a news story are most important, a programmer-journalist (we really do need a better name than that!) decides which *queries* should be made of data. For instance, on chicagocrime.org I decided it would be useful if site users could browse by crime type, ZIP code and city ward. On the votes database site, we decided it would be useful to browse a list of all the votes that happen late at night and a list of members of Congress who’ve missed the most votes. Once we made that decision of which information to display, it was just a matter of writing the programming code that automated it.

In the “journalism through computer programming” realm, the third step, presentation, is also automated. This is particularly complex, because in creating websites, it’s necessary to account for all possible permutations of data. For example, on chicagocrime.org I had to account for missing data: How should the site display crimes whose data has changed? What should happen in the case where a crime’s longitude/latitude coordinates aren’t available? What should happen when a crime’s time is listed as “Not available”?

Also, I should point out that the two example sites I’ve given are entirely automated, but often it’s not possible to automate an entire project. In most cases, information gathering is done by humans rather than computers, and the computer programming comes into play in automating the distillation and display of the data.

A good example of this is washingtonpost.com’s Faces of the Fallen site, which lists all known U.S. service members who have died in Iraq and Afghanistan. That information is collected by the Post’s fantastic newsroom research team, not by automated scripts. The “journalism via computer programming” in this case is in the setup of the website itself: Once our researchers collect and verify information, it gets displayed on the website and is made browsable and searchable by a variety of different parameters such as age, home town and military branch. That — the display — is the part that’s automated.

OJR: What is the value to a journalist in understanding programming, or even learning how to do it?

Holovaty: The main value in understanding programming is the advantage of knowing what’s possible, in terms of both data analysis and data presentation. It helps one think of journalism beyond the plain (and kind of boring) format of the news story.

Programming comes in handy in all sorts of other areas, too, including gathering information. Now that quite a few governments and organizations are publishing data on their own websites, it’s a valuable skill to be able to automate the retrieval of that data and compile it into a format that makes it easy to research and aggregate.

OJR: What should journalism schools be doing to prepare future journalists to work in a mash-up publishing universe?

Holovaty: J-schools need to get way more technical. A graduate of a journalism school should be a master of collecting data — whether the old-fashioned way (by talking to humans) or through automated means.

The closest thing journalism schools currently have (to my knowledge) is computer-assisted reporting classes. Those classes should be required, in my opinion, and even better would be for j-schools to partner with computer-science departments so that journalism students would get some experience coding.

OJR: What types of information are newsrooms collecting right now, but most under-utilizing on their websites?

Holovaty: Much of the information that journalists collect, day to day, is structured. Information such as crime reports, obituaries and event listings always follow a certain pattern, which can be richly exploited by databases.

The majority of newspapers takes the time to *collect* this information — which is the hard part — but they dramatically reduce its value by NOT storing it in structured formats. Instead, they distill it into big blobs of text for publication in their print editions, and then they shovel those big blobs of text onto their websites. At this point, all structure is lost: Crime reports can’t be sorted or searched intelligently, and event listings can’t be viewed in any sort of user-friendly way.

The very act of distilling information into a news story — which is essentially a big blob of text — removes any sort of structure. Information is exponentially more valuable if it’s structured.

So I urge news companies to retain as much structure in their information as possible. These days, it’s easier and cheaper than ever to set up a database server. Just do it.

A few specific examples? Any sorts of public records are structured, really. Crime reports are an obvious one. Fire-station reports, local school data, transportation data. There’s a ton of this stuff.

Beyond the obvious examples, journalists should step back and consider more abstract concepts in terms of structured information. For example, just a couple of weeks ago at washingtonpost.com we databased the “key races” across the country in the 2006 elections, as determined by our editors: http://projects.washingtonpost.com/elections/keyraces/ . Each race has a name, a state/district, a number of candidates — it’s very structured, if you think about it that way. And because we’ve databased it, we’ve automated much of the tedium of updating the site, because the site runs itself, grabbing information from our database.

This sort of automation and exploitation of structured information is where I think (and hope) journalism is going.

OJR: What ought news organizations do to encourage tech innovation from their staffs?

Holovaty: Hire programmers! It all starts with the people, really. If you want innovation, hire people who are capable of it. Hire people who know what’s possible.

And once you hire the programmers, give them an environment in which they can be creative. Treat them as bona fide members of the journalism team — not as IT robots who just do what you tell them to do.

OJR: Do you think most news managers are afraid of technology? If so, how do tech-savvy journalists overcome that?

Holovaty: I’ve met both types of managers — those that are scared and those that aren’t. (For the news managers who *are* afraid of technology, you can’t blame ’em. It’s only natural. Technology is completely changing their industry, whose rules haven’t changed drastically in a long time.)

It seems the best way to overcome the fear is to emphasize that technology can be used to further the goals of journalism. It’s reasonable for managers to be afraid of things they don’t understand, but if you boil down the specific technology to the specific journalism problems it solves, I suspect managers would be more understanding.

OJR: What is the most innovative project you’ve worked on? What was so interesting about it?

Holovaty: The projects that are most interesting to me involve reverse-engineering and altering Internet applications to do things they weren’t supposed to do, for the benefit of users. For example, a year ago I tinkered with putting CTA (Chicago Transit Authority) subway maps on Google Maps (http://holovaty.com/blog/archive/2005/04/19/0216/). It no longer works, but it was really cool. Also, I enjoyed creating the “All-Music Guide Fixer” Firefox extension, which, when installed, alters the display and functionality of allmusic.com (http://holovaty.com/blog/archive/2004/07/19/2210). This idea — site-specific user customizations of websites — eventually became the Greasemonkey Firefox extension.

In journalism, I’d have to say the most innovative project I’ve been lucky enough to work on was lawrence.com, the local entertainment site for Lawrence, Kansas. So much automated subtlety is happening behind the scenes of that site. For example, in the event calendar, an event that takes place at a bar will automatically pull out the drink specials for the day of the event. Similarly, if an event features a local band, the system automatically pulls out sound clips and creates an “If you go, you might hear these songs” sidebar. Lawrence.com has a ton of little innovations that go way beyond what most other entertainment sites do, even though the site has had these little innovations for more than three years.

OJR: What interesting projects are you working on now?

Holovaty: I’m heavily involved in the development of Django, an open-source Web framework for the Python programming language. In layperson’s speak, it’s free software that makes Web development fast and easy. We created it when I worked in Lawrence, and we open-sourced it in July 2005. It’s gotten a ton of attention, and people all over the world are using it and improving it. I’m cowriting a book about it at the moment, as well.

Aside from that, I’ve been collecting various public-record data in Chicago in preparation for the launch of my “sequel” to chicagocrime.org. Can’t say much more about this project at the moment, but I’m very excited to launch in the coming weeks!

OJR: Other that the stuff you’re working on, what technology you’ve looked at recently has grabbed your attention?

Holovaty: Generally I get excited by new APIs that various websites are launching. The Flickr APIs are a classic example: They let any programmer query the Flickr photo database via programs.

OJR: Journalism’s always been a competitive business. But what technical initiatives should news organizations be cooperating on? What opportunities, if any, are the industry missing when companies don’t work together?

Holovaty: I think news organizations should cooperate on removing mandatory Web-site registration walls, which are severely reader-unfriendly. It’s embarrassing to be associated with an industry that treats its customers with such disdain.

OJR: What online news projects have you seen recently, if any, that you thought were especially well done? (Not counting the Washington Post and other sites you’ve worked on….)

Holovaty: Off the top of my head —

* Just the other day I saw the great weather/hurricane tracking app at http://www.ibiseye.com.

* I’m consistently impressed by the stuff coming out of mySociety .

* Faneuil Media does some great work.

OJR: What tech sites do you check to keep up with the latest in mash-ups, programming and Web development?

Holovaty: Every day I check delicious popular a couple of times. That’s a good indicator of what people are talking about and the new things happening on the Web.