Proposal: New standards and tools for distributed online reporting

One of the Internet’s strengths as a medium for journalism is its ability to support widely distributed, grassroots news reporting. Whenever a significant earthquake hits Southern California, tens of thousands of residents log on to the local U.S. Geological Survey website to report what they felt. The USGS site processes these surveys in real time to generate zip-code level shake maps that depict the intensity of the quake throughout the region.

There’s no need to install sensors all over town. And no wait for a costly phone survey. The Internet enables the USGS to engage a small army of citizen reporters to collect their information. Journalists, of course, can do the same with their reporting projects.

But such efforts run into problems when there’s no single obvious source for grassroots reporters to submit their information. We saw this with the dozens of websites that attempted to compile missing persons lists after Hurricane Katrina. No one publication had a comprehensive list of the missing. And attempts to aggregate the lists required either finger-numbing cutting and pasting, or equally tedious RegEx coding to parse the data from the various websites.

Sure, the USGS managed to establish its website as the place to go to report earthquakes. But, for most stories, readers and journalists face the “Katrina conundrum” — too many sources trying to collect the same information, without coordination.

It doesn’t have to be that way. Of course, some journalists always will want to go their own way, searching for a scoop. But others see the value in cooperation, in working together to best provide comprehensive information for the public. To do that, website publishers need:

  • A simple online tool with which to collect fielded information from the public.
  • A way to share that information with others collecting similar information, and
  • A way for all those information collectors to know when other collectors have gathered fresh information.

Today, I want to propose that OJR lead an initiative to address these three needs.

Right now, to collect information like the USGS, or a Katrina missing persons list, you need to be a coder who can put together an HTML input form and a script to dump that information into a database. What online journalism needs is a free, open-source tool that does for grassroots reporting what Blogger.com did for online journals – making it easy for a non-coder to set up a grassroots reporting input page with no HTML or database experience.

Second, the information that tool collects ought to be recorded in a standard fielded format, so that it can be easily shared with other collection efforts. There’s no need to build a common database or central server to support this. All that’s needed is for each site collecting data to be able to export it as XML, using a common set of fields. Tools can be written, along the lines of RSS aggregators, to collect those XML fields and aggregate them into comprehensive databases.

Personally, I believe that the RSS standard itself does not support nearly enough fields to transmit an entire database of incident reports. We need something more expansive. Dave Winer’s OPML moves in that direction, but I don’t know that it offers the granularity needed for this project. The point is, I think a common XML format is the solution to the problem, but that we need to have some industry discussion as to what that format might be. Obviously, it ought to be flexible enough to accommodate everything from missing persons lists to fraud reports to (my pet project) theme park accidents. Let’s start talking on what that format might look like. (And, rest assured, I don’t want to recreate the overkill of NewsML.)

Third, we need a weblogs.com-type destination site that information collectors can ping to let identically tagged information collection projects know that they’ve been updated.

We could build a development tool that handles issues two and three itself. But I think that it is important that any development tool work with collection efforts that do not use the tool. That’s why the blogosphere works so well. You don’t have to use Blogger, or any other specific individual tool, to link to and aggregate other blogs. Our distributed reporting efforts should work the same way.

So, who’s interested in helping me refine this idea, and build these tools? The development of blogging tools showed us the power that could be unleashed when we liberated online narrative publishing from the HTML coders and opened it to everyone. Let’s do the same with distributed data reporting. E-mail me at [email protected], and let’s get started.