NewsML aims for the mainstream

The protocol news agencies use to transmit stories to newspapers and news portals like MSN and Yahoo will get its version 2.0 by year end. Developers of the standard — called NewsML — hope improvements will take it beyond its typical old media sponsors. Critics argue there are better tools to do the job.

XML-based NewsML bundles all story elements — like photos, audio, video and text — together in a virtual “envelope,” including a ton of information that describes the content in a way that a content management system (CMS) can understand.

The practical upshot is that all elements of a story are linked together and a CMS can automatically render, for example, the headline, byline, dateline, photo, intro and hyperlink on a news portal’s front page, and all elements of the story on separate webpage accessed by the hyperlink. A CMS can even render stories based on priority, or automatically update breaking news stories. It’s got a ton of other useful features too (see below).

But outside its core constituency NewsML remains a little known format languishing in a small tributary of the Web standards mainstream. “NewsML is a niche standard,” one of NewsML’s chief architects, Laurent Le Meur, concedes. “But it is an active one.”

Le Meur, Chairman of the NewsML Architecture (NAR) working group at the International Press Telecommunications Council (IPTC) and head of the Media Lab at Agence France-Press, is hoping the standard will become a lot more active in version 2.0. and will move from an isolated backwater into the main current of Web standards.

It could be quite a paddle. Some observers remain skeptical about NewsML and its relevance. “To me NewsML is essentially a tagging system,” says Robin Miller, aka Roblimo, editor in chief of the Open Source Technology Group and author of several books about open source applications. “We’re experimenting with an open source CMS called Xaraya that seems to do pretty much the same thing. And I think most open source CMSs are moving toward similar functionality.”

Still, at the moment, NewsML is the standard of news agencies. It is used by almost all the big international agencies, like Reuters, AFP, UPI, as well as about 40 national agencies, like Italy’s ANSA. In Japan it has even become an official Japanese industry standard (JIS X7201), which works like the codes of ISO, the international organization for standardization, the official body that decrees the size of threads on a screw or the dimensions of a freight container.

Of the big agencies, only AP doesn’t use it and, according to Le Meur, that’s because there’s no demand for it. “They’ll move to it when there’s a demand among their customers,” says Le Meur.

He adds that it’s not just for agencies, because news portals like MSN and Yahoo use it. “News aggregators are very interested in it, too. I’ve been told that it can cut the time it takes to integrate stories into their databases from weeks to days,” says Le Meur.

NewsML 101

NewsML’s role as a news agency standard began in 1999 when Reuters handed over development to the IPTC, an association that began life as an industry lobby group working towards newspaper access to telecommunications.

Since the 1970s, the IPTC started work on a series of technical standards for news exchange, such as the IPTC Core, the Information Interchange Model (IIM), the News Interchange Text Format (NITF) and NewsCodes, in addition to NewsML, SportsML, EventML and ProgramGuideML, various XML standards for handling specific types of content. NewsML, however, is the leading standard for news handling.

“In many ways, NewsML was a way to bring XML to the newsroom, where newspapers were often locked into proprietary editorial management systems,” says Michael Steidl, managing director of the IPTC. “You have to remember, many newsrooms were living with technology from the ’80s and early ’90s.”

Slugs in cyberspace

So what can NewsML do?

In its current version, 1.2, NewsML is a model and standard to represent and manage news throughout its lifecycle, including production and interchange. It can handle a variety of media indifferently. All the elements that make up a story are packaged in a NewsML envelope, so they are clearly associated on the receiving end.

A NewsML envelope might include pictures, text, video and audio in different formats — for example thumbnail and main photo — or text in different languages, or various video or audio formats. Stories can be automatically received and rendered on a Web page, for example, without any human intervention. NewsML tracks versions of a story and enables automatic updates.

All this is achieved by the cloud of metadata that surrounds the story elements themselves. Metadata — or data about data — is essentially a description of all the story elements. So metadata would describe the format of the text, but it would also it differentiate the headline from the byline, or the intro from the main body of the text, or the photo caption from other text.

That last part is important. When an editor looks at a story, all the elements are immediately apparent. But to a machine it’s all just text, so it needs to be told that the byline is a byline. With this information it can automatically put the appropriate caption under the right photograph. It can put a teaser or intro on the portal’s main page, but exclude it from the main body of the story. It can drop the byline on the front page, if that’s the website’s policy, but include it with the main story. All this is made possible by metadata.

In NewsML the metadata itself comes in bewildering variety. There are specific terms to describe a story’s genre (analysis, obituary, feature, opinion), its location, a NewsItem’s role (caption, main, sidebar and so on) and the subject codes, to mention only four types out of a total 28.

The subject codes are a world to themselves: “There’s three levels, within NewsCodes, of increasing detail,” says the IPTC’s Steidl. Each layer drills down to a finer layer of granularity.

Steidl offers the example of a political story. The top layer would identify it as politics, the middle layer would offer terms like local politics, diplomacy, defense and so on. Drill down further through, say, diplomacy, and you get terms like treaties, alliances, and summits and so on. And on and on. All the codes are in a machine-readable format and language independent, which means the same codes work in English or Urdu, Spanish or Swahili.

NewsML’s metadata also provides status details, like “publishable” or “embargoed,” and administrative details, such as acknowledgements or copyright details.

“There’s also a Unique ID [UID] for each story, so that when a story disappears from a Web page you could still search for it using its unique number,” says Le Meur, who considers this function as a type of ISBN code for news.

All of these standards can be used freely. “Essentially, the policy of the IPTC is only to claim the intellectual property for these standards. You can’t claim that NewsCodes, for example, are your work. But we don’t charge any fees or royalties,” says Steidl.

The metadata works like the digital equivalent of slugs, referencing all material related to the story, the story’s evolution and origin. It’s very precise. It’s also very, very elaborate: There are 1,300 subject codes alone, for example, and with other codes like genre and location, NewsML contains a blizzard of information for each story. On the plus side, stories get a common nomenclature to a very fine degree. And the NewsCodes can be adopted independently of the NewsML model, so it can be used with almost any tagging system. On the minus side, it uses too many terms to be practically applied by deadline-panicked reporters.

“The big agencies already have software that can semi-automatically apply the NewsCodes to individual stories. These then simply need to be checked by an editor,” says Le Meur. “In four or five years’ time, I could see a situation where an Application Service Provider [ASP] could offer the same functionality for bloggers, for example.”

Le Meur hopes that, if NewsML adoption becomes widespread, one day search engines will be able to index this metadata to provide very exact results for particular search terms. And, if adopted, he believes NewsML could be very useful to bloggers. “A lot of relevant blog posts, for example, don’t get returned as a search engine result. I hope that NewsML metadata could change that,” he says.

But the skeptics say …

The IPTC’s core constituency is convinced and actively working on NewsML 2.0, but convincing others of the standard’s relevance will be a tougher task.

“I think it’s very cute that these large old-fashioned content producers have discovered what we call tagging, and that they have managed to formalize it in a very cumbersome and rigid way,” says OSTG’s Miller.

This is a common complaint about NewsML. Last year, OJR editor Robert Niles described the NewsML standard as “overkill” in a blog post about the need for new standards to govern distributed online reporting.

“I can do pretty much the same thing with Word Press — a popular, open source blogging program that I use to run my little personal Roblimo.com website,” continues Miller. “All I’d have to do is add the exact forms. Because that’s all they’ve done, is come up with standardized tags for photographs, for video, for text and ways to link them together and to update these files as they change. I can do that now, without NewsML and without buying a special program to run NewsML.”

“It is tagging,” concedes NewsML’s Le Meur. “But it’s also a model for representing news. Word Press can do the tagging, but can you open a NewsML document in Word Press, maintain all the metadata and edit it? Right now you can’t.”

But Le Meur believes one day you could. NewsML is a model that can be added to content management systems, and Le Meur hopes it will become a sort of .pdf, a universal standard, for semantically tagged news stories. In same way Word can open .txt or .rtf as well as .doc files, CMS software can add the NewsML model to handle files in that standard.

One of the primary goals of version 2.0 is to tackle the biggest criticism: excessive complexity. “Right now NewsML is too verbose and too complex a standard. In version 2.0, we’re going to simplify the syntax. We’re also going to offer two levels of compliance: simplified core compliance, and then the power model. It should then be much easier to use,” says Le Meur.

The IPTC also hopes to add news “concepts” to version 2.0 of the standard, so that searches for the term “euthanasia,” for example, would return stories about Terri Schiavo or Dr. Jack Kervorkian — even if the word euthanasia isn’t used in those stories.

The hidden standard

But NewsML also suffers from another weakness: Very few people outside those directly developing it know much about it. Robin Miller is an exception. Of about 15 people contacted to comment on NewsML who were not involved in its development, only two agreed to speak. Two contacts were unavailable; the rest didn’t feel qualified to comment.

“I’m no expert on NewsML,” says Andrew Nachison, director of The Media Center of the American Press Institute echoing a common sentiment. “Is it important? The best I can say is that I don’t know. Standards probably have a place, and I can see how a standard set of codes might help.”

“NewsML made a lot of sense when everybody was buying million dollar content management systems,” says Nachison. “Frankly, companies that are still buying those systems I would say are making some critical IT mistakes.”

It’s a sentiment echoed by Miller: “I think it’s an improvement or advance for Reuters or BusinessWire. It’s an improvement for those who had nothing. But what we’re really seeing with NewsML — it’s real significance — is that for the first time the old publishing businesses are saying, ‘Here we’re going to use XML, and we’re going to publish all our stories as Web content and peel some of it off as print.’ ”

Nachison also believes NewsML may have a role. “NewsML was always an open architecture, but one that’s not easy to implement. If they can get flexible enough and open enough that it could be incorporated into lots of low-end publishing systems and open source publishing systems, there’s a chance it could be seamlessly integrated into existing open source programs,” he says.

There are a lot of ifs, buts and maybes in Nachison’s prognosis for NewsML. But, despite criticisms of this news standard, agencies across the world will continue with its development. Whether the standard can move outside the its industrial tributary and into the mainstream remains to be seen.

Fancy a bwiki?

Social networks and the news: what could that be?

Well, newspapers could begin by opening their stories and analysis to reader comments, right under the story instead of solely on the letters page. They could invite readers to periodically pose questions to journalists directly about how stories get written; politicians, celebrities or newsmakers could respond to reader questions; editors could explain news priorities.

Readers could rate stories, pictures, videos and reader comments, and they could get alerts when particular readers make comments or when particular news items come up for discussion. Readers could make suggestions about stories they believe need coverage. Newspaper discussion groups could chat about movies, automobiles or sports with staff and wire copy providing the fodder.

These are comparatively simple services to deploy. But as yet, little experimentation with even these simple services exists among newspapers. And there is such potential for developing relationships with readers.

Newspapers could support reader blogs about everything and anything, drawing bloggers’ friends and admirers to the site and developing a community around the news. They could encourage blogs from under-represented areas. They could engage and support hyperlocal journalism initiatives.

Newspapers could develop their own blogs, with local volunteers running a community blog, perhaps organised through local libraries. Data streams from weather services and transport companies could provide real time information on the weather and train, bus and airline times, with real-time updating on delays and late arrivals.

Local schools could announce ‘snow days’ through the web. A wiki could allow residents to write the history of their town. Some information could be incorporated directly from the U.S. Census and other sites like Wikipedia.

This bwiki could be a one-stop shop for all local information, from what movies are playing at the local cinema to the service times in the local Unitarian church. Residents could read, write and discuss the news, or any of their interests. Tourists could check the town’s history, its highlights and relevant travel information.

Sound improbable? This combination of all relevant local information, allied with resident participation was the original idea behind UK property website UpMyStreet.com. In the U.S., some city officials are opening sites to get important information, not covered by the MSM, to residents.

Most of these services exist in one form or another already, but they are dispersed across multiple websites or technologies. New technologies for disseminating information, like podcasts, for example, are emerging all the time.

Software can automatically import much of this data, like local weather forecasts and transport timetables, and with XML adoption this type of data collection will become easier. Capturing relevant data across many networks and services is the fundamental logic behind RSS and Konfabulator, the widget manager. Add in a local journalism site like Northwest Voice and you have a comprehensive address offering all local information and the opportunity to network.

This is not only about responding to threats; it’s also about providing better service and more relevant information to readers.

Newspapers are already thinking in these terms. In a Wired News piece by Leander Kahney, Ralph Terkowitz, vice president of technology at the Washington Post, outlined different hypothetical scenarios for developing social networks around the news. For example, there could be a discussion group on world politics or a book club in the arts pages.

With new technologies and media channels developing all the time, it won’t be too long before people can easily set up a website that automatically culls relevant information from a wide variety of sources, with volunteers filling in the rest of the details.

It might not work. There are technical issues, and it will require time and money, as well as an active volunteer group to work. There are decency and partisanship issues. The local Catholic Church might object to its news appearing on the same site as Gay Pride, and vice-versa. Defamation, slander and inaccuracy could be problems.

But these are not new problems, and newspapers need to work hard to sustain their relevance in a rapidly changing publishing landscape. This might be one way to do it. But whatever happens, newspapers need to do something to respond to citizens’ desire for relevant, timely information.

So, now, fancy a bwiki?

Social networks: All around the Net, but underused by news sites

In the last two years social networking sites mushroomed across the net, heavily fertilized by hype and the promise of six degrees of connection between socially dispersed people who shared common interests or friends. Now companies actively apply social networking principles to shift more stock and lure more clickthrus to their site.

In 2003, social networking sites Friendster, tribe.net and LinkedIn started business. Now there are up to 200 social networking sites covering everything from business contacts to dating. In New York and Boston mobile service Dodgeball informs users when a friend of a friend (FOAF) is within 10 blocks. FOAF is now also the name of web-based software protocol that describes people and their friends.

Eurekster is a search engine that matches your search terms with pages viewed by your friends and FOAFs. In December Netflix launched a new tool, where members can see recommendations from their friends. Delicious Library, a hugely popular book, CD and DVD inventory management program for the Mac will include social networking tools in the next version so friends can share their tastes.

The concepts, then, are all around the Net, but what are social networks, really? How do they work, and what’s their relevance to the news business?

There are two flavours of social network: the broad and the deep. Broad social networks are any form of on-going interaction between people, and they have existed since the emergence of tribes in human evolution.

“There’s a whole spectrum of relationships,” said Howard Rheingold, who in 1993 wrote a book about social networks and virtual communities. “For example, somebody in the store, whose name you don’t know but from whom you might buy regularly, they are part of your social network, the person who delivers your mail, your friends and family. There are all sorts of degrees of relationships within social networks.”

This sociological definition comprises a broad matrix of relationships that feeds one of the most powerful appetites in people: the hunger to connect with others through a common goal, shared interests or mutual benefit.

The deep definition refers to the increasingly common online practice of developing tools to leverage the friends of a friend for mutual benefit. “I think the use of the term social networks has changed. Since Friendster, I think it’s come to mean services like Friendster. Of course, the term pre-dates the Internet,” said Rheingold.

On the Internet, social networks are old news. In the early days of the Arpanet, originally designed for military communication, designers saw the new network as a means of collaboration between researchers. Collaboration was one of the ideas that inspired Tim Berners-Lee to develop the World Wide Web.

“Social Networks, codified or not, provide a mechanism for prioritization and filtering of information, including news,” said Tony Gentile in an e-mail interview. Gentile, a strategic marketing consultant for Internet companies, runs buzzhit.com, where he writes about emerging technologies and trends.

Technologies, like bulletin boards, Internet forums, e-mail, chat, and instant messaging are all social networking tools. Amazon recommendations, its listmania, reader’s reviews and ‘people who bought …’ listings all serve as relevant context and timely information in a world of virtually unlimited choice. They are simple and basic functions that tap into low level, or broad, networking concepts.

Deeper social networking is emerging with weblogs, where people start a conversation and, often, invite others to comment. Add trackback into the mix and you get the formation of conversations, relationships and networks. Sites like LiveJournal exist to enable blog networks. Rathergate, Easongate and the Tsunami disaster are all examples of how elaborate social networks develop spontaneously around the news, with people sharing news, evidence and analysis.

“In the case of blogs and feeds, your social network may do more than simply refer content to you; they may annotate it with their own analysis and commentary,” said Buzzhit’s Gentile. “This combination of social networking, blogging, referral and annotation is at the heart of service likes Rojo.”

Similarly, wikis such as Wikipedia and open source software development, like Linux or Firefox, draw together a network of disparate individuals into a shared goal.

Again, all this is old news, but now companies want to develop ever more elaborate social networking components into their software and services. The Netflix and planned Delicious Library tools are significant advances on Amazon’s ratings, for example.

So far, most newspapers have done little to develop the tools required to enable interesting, broad networks needed to foster ongoing relationships with their readers. Most newspaper innovation stopped with ‘e-mail this story’ and subscription models. Even less exists for deep networks.

“I know of very few major U.S. newspaper sites that allow readers to comment on articles. Most channel interaction is through ‘Letter to the Editor’ from posts or e-mail addresses,” said Buzzhit’s Gentile. “And that’s a big issue. Providing the tools is the easy part. Encouraging usage, that is, building community, starting a conversation, is far harder. So, yes, newspapers need to provide the tools … but fundamentally, they must reinvent their relationship with their readers.”

But now some newspapers, too, are beginning to deploy social networking tools and components into their core services. The tribe.net experiment by The Washington Post and Knight-Ridder is an attempt to shore up the rapidly disappearing newspaper classified market and to engage more meaningfully with readers.

In the UK, Silicon.com, like other news sites, encourages reader comments on every story. Those comments sometimes generate stories in their own right. The UK’s Guardian uses discussion groups and regularly invites readers to chat and pose questions to newsmakers and journalists on its live talk section. Additionally, it has a lively set of message boards where readers can exchange views.

Salon.com offers its own blogging service. In citizen journalism, sites like the Northwest Voice develop networks around local news, combining aspects of a blog and a wiki. Across Netspace newspapers are beginning to deploy social networking tools that connect readers more directly to the paper and each other.

At Yahoo News, for example, the ‘Most Popular’ section gets the greatest number of visitors. Yahoo doesn’t break down the numbers to the ‘most popular’ page, but the News site overall received 23 million visitors in January.

“We have three flavours of ‘most popular’: most e-mailed, most viewed and highest rated, and also we’re looking at ‘most searched,’ and we may find ways to start using that data on the site as well,” said Neil Budde, Yahoo’s Director of News. “With over 20 million visitors each month it provides information about what’s interesting to a wide swath of people.”

Right now Budde wants to look at ways social networking could enhance traffic data and reader experience.

“What we would look for down the road is ordering stories more dynamically, based on not just an editor’s view or a mass view, but maybe take social networking, like ordering stories based on a reader’s group of friends, or on people who have similar interests to your own,” he said. He thinks similar tools could be developed around readers’ comments.

But are social networking tools worth all the fuss? Will it have an impact?

“Are you asking me if there’s a way to make money off of a cost-effectively, virally acquired audience of above-average participatory users who produce metadata based on their activities? I think the answer is clear ;-),” said Buzzhit’s Gentile. “… (Benefits are) audience growth, participation and metadata. All of which should ultimately improve monetization.”

That’s worth pondering. In January the New York Times carried a story by Eric Dash about how the Dow Jones’ purchase of CBS Market Watch was prompted by concerns over growing ad revenues: The online WSJ was running out of space for the ads. Even though the WSJ can add an infinite number of pages to its site, advertisers only like the popular ones. The thriving connection with readers outlined by Gentile is one way of maximizing the value of the pages newspapers already have.

Still, not all are convinced that news is an exciting proposition for social networking.

“The question is … is it already too late? Content distribution is gravitating toward feeds, and feed readers are integrating social networking. Newspaper sites might be able to integrate SN via FOAF, or similar open frameworks, but the likelihood of a consumer inviting 30 friends to a newspaper site seems … remote,” said Gentile.

Right now, I have a feeling Gentile is right. ‘People who read this story also read … ‘. It doesn’t work for me. But there’s no reason why it shouldn’t.

Every subject under the sun has a history; likewise every subject under the sun has news. For dating, it’s the hot new singles bar; for cinema, it’s the latest releases and their reviews. Can’t newspapers develop as a node that taps into people’s desire to network, by sharing interests and information across all topics? Can newspapers open their pages to readers and seed the conversation with content they already produce? (see sidebar, some imaginative speculation).

Whether newspapers can effectively deploy social network technologies, and what effect they may have, are moot points. But according to Yahoo News’ Budde, one thing is sure:

“Social networks are going to continue to evolve, and all the media need to pay attention to it.”