Holovaty's EveryBlock unlocks neighborhood news data

Noted journalist/programmer/Web guru Adrian Holovaty just launched his latest project, the Knight News Challenge-funded EveryBlock. As the site’s name implies, it strives to provide information about every block of the three cities it covers: New York, Chicago and San Francisco. Included data might include crime reports, civic inspections and filings, even geotagged Flickr photos.

Too many news professionals get bogged down by traditional notions of “journalism,” that what newsrooms publish must be multi-sourced narrative “stories,” from five to 500 inches long, following J-school-approved norms for reporting and narrative structure.

Feh. When I worked a small daily, many of us on staff suspected that the most popular features in the paper were the obits, police blotter and log of ambulance runs. And you know what? When I moved up, to bigger cities and bigger dailies, I missed not being able to check the paper to see where that police cruiser or ambulance I heard yesterday was going.

Readers love information. Whether that’s a police blotter, local bulletin board, school lunch schedule or gripping story in the local paper — they don’t care about the format. Readers just want it to be accurate, relevant and complete. Without anything misleading or extraneous, either.

That’s why I love watching people like Holovaty, whom I’ve interviewed before on OJR. The public has voted with its mouse clicks that it wants more information from the rest of the world that they are finding from the same, stale stories in their shrinking local papers. Holovaty’s creations offer the promise of a reinvigorated news industry, driven by journalists who can wield code, statistics and data every bit as effectively as words and grammar. I e-mailed Holovaty, and asked him about EveryBlock.

OJR: What’s EveryBlock providing that the average Web reader could not get before?

Holovaty: First, fundamentally, we offer a way to browse news at the block level, with a news page for every block — hence the name EveryBlock. We’ve done a fair amount of due diligence and are pretty confident this hasn’t been done before — and in three of the densest cities in America, at that.

Second, we’re providing some information that didn’t previously exist online. Two examples are film locations in Chicago and restaurant inspections in San Francisco. The former is provided to us by the Chicago Film Office, and the latter is provided to us by the San Francisco Department of Public Health, which has its own website but doesn’t include some of the data we publish.

Third, we make it easy to browse information that already existed online but was buried in deep government sites, either in “deep Web” search forms or non-Web-friendly formats such as PDFs. Two examples are landmark building permits in New York City and crime reports in New York City, but there are many other examples across our three city sites. This has been an interest of mine for a number of years, and it’s a dream come true to have the opportunity to do it at this scale.

Fourth, we’re detecting geography in narratives — “blobs,” so to speak — and making it easy for people to find relevant news articles and government documents that refer to specific places near them. Some examples are New York City news articles, San Francisco zoning agenda items and Chicago city press releases. Another (geeky) way to phrase this is that we’re harvesting geographic metadata from unstructured text.

Fifth, we’re providing some light trending and aggregate reports for *each* type of information on our site. For example, see the Chicago crime data.

OJR: Describe the work that went into creating EveryBlock.

Holovaty: The work that has gone into creating EveryBlock has been quite diverse, which makes the job interesting and exciting. On the “human” side of things, we’ve established many relationships with government officials and other partners who are responsible for local data. On the user-interface side, we’ve worked to design a gorgeous, easy to use site and an architecture that accommodates a wide variety of disparate types of information. On the map side, we’ve made our own maps, deciding against Google’s or Yahoo’s map offerings for a number of reasons; that took a sophisticated combination of design, coding and data chops. At the technical level, we’ve developed an array of technology just to get all of this data into an elegant, unified system. It’s beautiful. And we’ve even done a fair amount of manual labor, from hand-drawing neighborhood boundaries to hand-tagging newspaper articles to train our geoparsing algorithms.

OJR: I suspect that when many people, inside and out of the news industry, hear the word “journalism,” they think of a specific, narrative format for providing information. But sites such as EveryBlock provide information outside the traditional newspaper narrative form. Do you think that people in the news industry need to modify or expand their conception of “journalism” in order to account for the new and different ways that people can access and present information online?

Holovaty: People can define “journalism” however they’d like. At EveryBlock, what we’re interested in exploring is what sort of frequently updated information consumers want at the block level, and how they’d like to receive it. Whether this is called “journalism” or not is strictly academic. (I think it’s hard to argue against calling it “news,” though.)

I think people in the news industry should indeed modify their conception of what information they publish, and how they publish it. But should they modify their conception of “journalism”? Leave that to the people who have the time and inclination to debate semantics.

OJR: What has kept, or is still keeping, newspapers from having functionality like EveryBlock’s on their websites?

Holovaty: Unfortunately, there’s a lot. In the general case (and “general” means this excludes the newspapers out there who are doing great things online) —

* A lack of technical competence
* A culture so obsessed with daily deadlines that little thought/resources are put into paradigm changes
* A culture that disdains technology and science, particularly math, and, worse, actually takes pride in that
* Red tape
* Legacy systems
* Legacy attitudes
* People who ask “Is this journalism?” 😉

OJR: How long can you keep the site running on your Knight funding? What happens after that runs out?

Holovaty: That remains to be seen! Knight has awarded a two-year grant, and we’re just over six months into it, so… ask again in about 18 months. 🙂

OJR: Who are you trying to reach with EveryBlock? How are your promoting the site?

Holovaty: We’re trying to reach residents of the EveryBlock cities. If you live in Chicago, New York or San Francisco, we hope to make your block’s page on EveryBlock into something that you’d find useful time and time again.

Something tells me you won’t be seeing EveryBlock billboards on the expressway, or EveryBlock ads on subway cars. That’s just not our style. We’ve e-mailed friends and family, and the rest has sort of happened through word of mouth, blogs and media coverage. This approach worked well for chicagocrime.org, which has (anecdotally) gained pretty good awareness over the past two and a half years here in Chicago, with zero traditional marketing on my part.

OJR: What’s the timeframe, and procedure, for expanding to other cities?

Holovaty: This is an interesting challenge that we knew going into the project: to some extent, the technology is scalable (i.e., replicable in a way that makes subsequent cities easier to launch), but at the same time, every city’s data is different. We still don’t know how that breaks down, resource-wise, but it’ll be fun to find out.

OJR: If a local publisher wants EveryBlock technology on his/her website, what should they do? Are you working with partners? Should they try to build this themselves?

Holovaty: We’re obligated under the terms of our Knight grant to release the site’s code under an open-source license at the end of the grant period. The idea there is to experiment and do some good for the news industry — that’s one of the core missions of the Knight News Challenge program (www.newschallenge.org), which is the contest I entered to receive this grant.

In the meantime, though, we’re very interested in working with partners — media companies, governments, bloggers and any other local-news publishers — in our EveryBlock cities. (Folks can contact us at feedback at everyblock.com.)

Personally, I wouldn’t recommend that news organizations attempt to build this themselves, but it’s obvious that I’m biased — both by having implemented EveryBlock and having worked at a number of news organizations.

OJR: What information, related to EveryBlock or not, do you really wish you could get in front of Web readers, but that you haven’t figured out either how to get your hands on or how to present effectively? (In other words, what’s the next challenge?) What’s it going to take to get that information out there?

Holovaty: That’s another good question. Regarding data on the current EveryBlock cities, I’d say we’re only at 10 percent of where we could be. We’re almost ready to add a couple of data sets that didn’t make it in time for launch, and we’re continually adding news sources and blogs to crawl.

One type of information that we purposefully haven’t included on EveryBlock is “static” information — the location of the nearest subway station, or the census demographics for your block. There’s been a small amount of user interest in this, but there’s a core difference in that type of information, namely that it’s not time-sensitive, and it would take some thinking to figure out how that fits in with our current “news feed for your block” paradigm. We’ll see what happens. It’s one of the many interesting problems we look forward to tackling.

