Pay or free? Newspaper archives not ready for open Web… yet

Information wants to be free — as long as you don’t have to pay the people who dug up that information. While the Net has long been associated with free things — free e-mail, free personal Web pages, free searches — the news business has been repulsed by the notion that their hard-won scoops and journalism should be given away for free.

But the newspaper business has had little choice but to open its gates online so people can read breaking news for free. How else to compete with free news from,, and the plethora of advertising-supported sites? Now, a rising chorus of voices is calling for more: free archives at newspaper sites so that search engine and blogger links will remain live, newspapers can retain their authority in Google and articles can remain part of the online conversation.

Dan Gillmor summed up the meme spreading at the recent Blogging, Journalism & Credibility conference at Harvard, by exhorting on his blog: “Newspapers: Open Your Archives.” Gillmor went on to explain that the huge increase in traffic to archives will eventually help the revenues “greatly exceed what the paper had been earning under the old system” as expenses drop.

While bloggers such as Cory Doctorow and Jay Rosen added more calls for open archives, the keepers of the keys to newspaper archives were less enthralled with the notion. Those pay-per-view archives bring in a steady source of income for large newspaper sites, shielding them from the cyclical nature of online advertising. They also help preserve huge revenues from database services such as LexisNexis.

Martin Nisenholtz, the dean of online publishers as CEO of New York Times Digital, says there are two main reasons charges for most of its archives: The marketplace has already valued the content to be worth much more, and there’s no way to recoup that value through paid-search ads (such as Google AdWords) or even display advertising.

“We’re not about to give away something that the marketplace is paying a huge premium for already,” Nisenholtz told me, “unless you could get a lot more than that premium in some other way, which you can’t, believe me, there’s no way. There’s no analysis to show that Google AdWords gets you anything close to what we make on archives on the Web — never mind all the money we make on the after-market sales. It’s so ridiculous as to be laughable.”

Nisenholtz admits that paid archives on only brings in a percentage in “the low single digits” of overall online revenues, and Borrell Associates estimates that paid archives account for less than 5 percent of online revenues for newspaper sites.

But the sale of archives to LexisNexis, Factiva, ProQuest and others is a “significantly higher number” for, according to Nisenholtz. While there’s no stipulation in its database contracts for NYTD to keep archives behind a wall, Nisenholtz realizes that making archives free online would erode their value in other places.

The benefits of free

Despite those economic caveats, Nisenholtz has still opened up some content on, with older travel and technology articles joining book reviews in the free archives. In the case of valuable vertical subjects, such as travel, the money from partnerships in the search results is too enticing. For someone looking for travel articles on Thailand, for instance, the paid links to various travel agencies and packagers are much more interesting than links served up on a search for “Iraq War.”

But even as newspapers salivate over the new boom in paid search ads, they shouldn’t get ahead of themselves. Nisenholtz warns that online advertising has had some nasty ups and downs in recent years.

“One of the things that people get a little bit crazy [and forget] about during an advertising boom is the fact that two years ago there was a bust,” he said. “Advertising has always been cyclical in nature, so the extent we can have revenue streams that are not advertising-oriented, it evens out those cycles. I believe just as a business person, a diversified revenue mix is better than a single revenue stream.”

Simon Waldman, director of digital publishing for the Guardian, wrote an eloquent essay on PressThink about how important permanence is for news sites — that their stories remain at one URL and continue to score high on Google searches. But Waldman never brought up the economics of an open archive and told me that every newspaper has to decide the value of its own archives.

“You have to make a decision based on a 360-degree appraisal within one business, not on a narrowly focused view across dozens of businesses,” Waldman said. “I think it’s unfair for anyone to say, ‘I think all newspapers should do this’ because every newspaper is different.”

The Guardian is one of the few larger newspapers, along with the San Francisco Chronicle, that doesn’t charge for archives online. (For a listing of newspaper archives online and what they charge, check this listing from the Special Libraries Association.) Waldman notes that open archives have helped the Guardian’s readership grow beyond the UK and Europe and into the United States.

“Having a permanent presence on the Web like what we have is the most cost effective form of marketing that you could ever hope for,” Waldman said. “We are a UK-based paper, but the decisions we’ve made about permanent linking and the availability of our archives — the two combined — is part of the reason we’ve had a much greater footprint on the Web than you’d ever deem possible from our print circulation.”

Because Google — not Google News — is such a huge driver of traffic to the Guardian’s site, the open archive has helped the Guardian’s stories remain high up on search results for recent events, while many walled-off stories do not. Of course, if more sites opened their archives, the Google advantage would be less pronounced for everyone.

Prices rising or falling?

Both Waldman and Nisenholtz agree that the way to decide on pricing archives is a complex formula that includes reader interest, Web traffic, story content and after-market database contracts. And as the various factors shift, pricing changes too. The more philosophical Netizens such as Waldman believe the price for content is inevitably going downward, while larger publishers are actually jacking up the price of their archives online.

So what gives? It’s possible that publishers are just now realizing the value of their content for researchers, law firms and academics — right as the open Web is making all content more accessible and cheap. In Waldman’s view, the bottom will be dropping out for online content.

“There will be a lot less money changing hands than some people would have liked for their archives,” he says. “The way things are moving is that people are paying less for traditional content. The price pressure is all down. When you’re looking at music or what-have-you, everything’s moving to a much lower payment.”

The folks at ProQuest beg to differ. The company was originally a microfilm company that has diversified and now hosts and handles e-commerce for archives at 85 newspaper sites. Chris Cowan, vice president of publishing at ProQuest, says publishers have undervalued their content and encouraged some of his larger clients to raise their prices to $2.95 per article. The result was that ProQuest’s revenues were up 20 percent last year, with little drop in demand for paid archives.

“What we’ve found to be true — not universally true, but in general — the price elasticity per article is much higher than publishers assumed,” Cowan said. “And we’ve seen little or no drop-off in the number of sales at a publisher’s site. And with the increased price per article, it’s been a nice boost in revenues.”

Ken Doctor, vice president of content services for Knight Ridder Digital, concurs that a recent price hike at its 30 newspaper sites from $1.95 per article to $2.95 only had a slight dip in demand at the outset. Plus, KRD has had success selling “bundles” of articles for a discount price. Doctor told me the market for licensing newspaper content is only just beginning, with the explosion of mobile and wireless platforms.

For smaller newspaper sites, other factors come into play. The Santa Fe New Mexican, for example, takes the unusual approach of having a pay site including exact digital replicas of its print edition for the past seven days, along with a free site that includes a selection of stories from the newspaper. While you have to pay for most of the archived stories or older digital replicas, the articles that were culled for the free site remain free in the archives.

The New Mexican’s Web publisher Michael Odza said he was hoping to launch a unified search engine by 2006. While paid archives bring in less than 1 percent of the site’s revenues, Odza says it was the trade-off of having ProQuest handle the entire archiving process, including e-commerce and customer service.

“Because the system for labeling each story and making sure it is in the archives and that you can search it is expensive to build, we went with a service that did it for us,” Odza told me. “It was the expediency of the time to build an archive and the vendor handles the customer service.”

Link servers and micropayments

All of this discussion of closed archives and Google as the arbiter of history is enough to raise the hackles of any librarian. Strangely enough, libraries use taxpayer money to buy those pricey newspaper archives for the express purpose of offering them up for free to library patrons on the premises or even online. So we’re all paying for archive access at libraries, even if we never visit them.

Luke Rosenberger, a librarian blogger, noted that much of what Gillmor wanted was already being considered by academics in OpenURL, a way to present content using metadata instead of a static URL. With OpenURL, your search for a paper or news story would go through a link server or link resolver that would present you with repositories such as local libraries or pay databases for that piece of information.

“I believe that if more people were exposed to the online database resources that are already available to them through their own public and academic libraries, the idea of newspapers opening up their archives might have less appeal,” Rosenberg told me via e-mail. “Why would someone want to go searching on dozens of different Web sites — New York Times, Wall Street Journal, etc. — when they could search them all plus many more from the same place?”

As the databases and search engines improve, news sites might also streamline the way we find old stories on their sites. Waldman, Doctor and Nisenholtz all pine for a single sign-on system, where Net users would be debited for tiny micropayments as they visit each archived article — without having to input a credit card at each purchase.

“My dream would be an invisible payment mechanism that would allow you to completely engage and have a major presence on the major search engines and have a pay-per-view [for each article], but I don’t quite see it yet,” Waldman said.

Nisenholtz said news sites could learn a lot about e-commerce from Amazon, where nobody trolls around looking for all the free content they can eat. “I think it’s incumbent upon us to make it so easy that the consumer’s purchase decision is brainless,” he said. “Micropayments, one-click ordering, whether you can store a credit card with one vendor and use it across all the sites that vendor sells.”

Friction-free commerce might help tear down the search walls for old newspaper content — with the cooperation of Google and other search engines. But as for the short-term, publishers might want to remain cautious about opening up their archives for free.

“I think there is a sense for a lot of people, making archives free is effectively letting the genie out of the bottle,” Waldman said. “You might want to do it at a later date, but not now, not when you don’t have to.”

About Robert Niles

Robert Niles is the former editor of OJR, and no longer associated with the site. You may find him now at


  1. Another interesting issue (to me at least): By removing articles to paid archives, newspapers are surrendering part of their authority as gatekeepers to content online.

    When a online newspaper article moves off the free Web and into an archive, the passed PageRank value of the link from newspaper to the referenced Web site disappears. Since newspaper sites tend to enjoy high PR value, the loss of links from them can be significant to smaller Web sites that get a lot of traditional press coverage. With newspapers out of the game, that leaves to the blogosphere and, unfortunately, link spammers the job of determining which Web sites will place most highly in Google’s search results.

    Skeptical news publishers might respond: why should I care? Well, for starters, isolating oneself makes the Web (and society) a less rich place. And if publishers care only for the money, let’s remember that if newspapers did a better job of linking *to each other* and keeping those links alive, they’d do even better in search results, creating an opportunity to further expand their audience among typically younger readers for whom Google and Yahoo, not newspapers, are the gatekeepers to modern thought.

    It is illuminating to me to see the number of professionals who are using Google for searches that in the past would have been done on Nexis. If the search engines are conditioning a generation of information seekers to come to *them* for news, then the revenue from paid archives might die off with its current generation of users.

  2. Do journalists really create information or do they just capitalize on it? It would be nice to think that journalists turn events into words, over which they assert ownership. Does that information really belong to them? Isn’t language a public resource like the air and fish in the ocean? When a reporter covers an accident or a death or a scientific breakthrough, the reporter does not create the event or the language about it. There was a time when it seemed legitimate to charge for printing words on paper. There was a time when the newspapers were the only source of news. Today, newspapers can legitmately charge for hard-copy papers, but they cannot charge for the news, whether recent or archived. Like the air, the news doesn’t belong to them. It is everybody’s.

  3. The problem with the ‘news is free’ thesis is that its logical extension suggests that words are public culture, and therefore merely arranging them in a different order should not give writers any status of ‘ownership’. Likewise music – just the same notes in a different order.

    I’m 100% in favour of newspapers charging for physical copies, and publicly archiving all articles for free online – but I don’t think this is the right argument to prove it – and certainly not to newspaper owners.

    It is, however, good business sense to establish yourself as a market leader, a paper of record and a reliable source for researchers of all kinds… and that will inevitably drive your hard copy sales.

    I don’t understand why the NYT doesn’t see that.

  4. I followed your link to Luke Rosenberger’s post about OpenURL. It is indeed an intriguing development if fully implemented. Wouldn’t bloggers who use articles from periodicals as part of their research for preparing posts love to have a resource that would bring up the text of these articles at the click of a mouse? Sure.

    But not so fast. You need access to a library that participates in OpenURL because they are the ones who provide you with the computer access to the databases. If you’re affiliated w. a research library you’re in like Flynn. If not, you’re out in the cold. Though I live in Seattle where you’d think libraries would be hip to these sorts of developments, the Seattle Public Library doesn’t seem to participate in OpenURL (at least that’s what one reference librarian told me–but I am trying to confirm this info).

    Also, if I understand OpenURL correctly, it will provide the researcher access to the text of an article in the database, but it will not allow you to link to it in your post in such a way that your blog readers can read the text as well. So OpenURL is great for quoting text, but not for linking to it.

    This is all based on what may be imperfect understanding of OpenURL since I’m not technically up to speed on it. I’d love for this thing to develop further so that it’s incredibly useful & easy to implement. As far as bloggers are concerned, I’m not sure it’s there yet.

  5. Is it possible that this concept of for fee archives will be Napsterized? Here’s one site that allows registration-free access to the NY Times and access to for fee archives.

  6. Hi! Let me just add a note to try to clarify a couple of things about OpenURL in light of Richard Silverstein’s comments. First of all, OpenURL does not grant anyone (blogger or reader) access to material that they can’t already access through their own institution — but it *is* a recognition that library print and electronic holdings can be rather challenging to navigate. OpenURL is a way to cut through much of that navigation, using the metadata about the desired resource to provide the reader with a menu of options about how to access that specific content through their library. For example, if I wanted to blog about an article in Technology Review (which has subscriber-only archives), an OpenURL would capture the metadata for that article (ISSN, volume, issue, date, author’s name, article title, etc.). Assuming there are mechanisms in place to associate those OpenURL links with their own libraries’ link resolvers, then a patron from Richard’s local library (Seattle Public) would be offered links to the full text of that article in the databases “Business & Company Resource Center” and “Expanded Academic Index” (both databases that SPL subscribes to), while a patron from my local library would be offered a link to the full text in “Academic Search Premier” (to which SPL doesn’t subscribe) and “Business and Company Resource Center” — but not to “Expanded Academic Index” (to which SPL subscribes but my library doesn’t). We would both also be provided a link to the record for Technology Review in our respective online library catalogs, in case we preferred to read the article in print. A user whose library did not subscribe to any of the above databases would still be provided the link to the library catalog record — so even though online access wouldn’t be an option for them, they could still go directly to their library’s print copy. I should note that although I cannot confirm that Seattle Public Library (SPL) has an OpenURL link resolver service, I know that neighboring King County Library System (KCLS) does — and I suspect that SPL does as well, because both KCLS and SPL appear on this client list from SerialsSolutions, a Seattle-based vendor whose major product, ArticleLinker, is an OpenURL resolver.

  7. This is a comment from Gary Price, of, who was having trouble posting this to the forum himself:

    The San Francisco Public Library and THOUSANDS of other libraries are offering even more databases REMOTELY. In other words, available 24x7x365 WITHOUT leaving your home or office. It’s ALL FREE. In fact, the SF PL now offers every page ever published (FULL IMAGE, delivered in PDF) of the NY Times back to Vol. 1, No.1.

    Via a library here in the DC area, I not only have access to the Historical NY Times but also the Washington Post and Wall St Journal.

    These databases are just the tip of the iceberg of the fee-based content is available via site licenses to the public WITHOUT having to leave their home or office. Full text, images, etc. from new and old publications, full text reference books, and other databases too!

    What every library offers varies all you need is a library card. In Michigan, they have a statewide program where you don’t even need to have a card.

    Example: Here’s what San Fran Public offers for FREE to anyone with a SF PL card.

    All others need to do is check the web site of the library or libraries they have access to ( or just give them a call.

    Finally, more and more libraries are offering what’s referred to as federated search technology. The SF PL offers this type of service. Using it you can search multiple databases simultaneously, take advantage of advanced search options, remove, dupes, etc. Perfect, no. But this type of technology is improving on a very rapidly.

    Most of these tools can be searched via a “Google” like interface but also offer powerful searching capabilities and use controlled vocabularies to assist in bringing like things together.

    The more you dump into Google (and others) the more difficult it will become for the typical searcher to find what they’re looking for.
    Only a few things can get onto the first page of results. Plus, time (how long people have to search), the fact that the typical searcher
    only looks at the first few results, and outside influences can also make a difference on what makes it to the top of a results page.

    Robust federated search technology allows content to be kept in individual databases until search time and merged together as needed. It also allows for a SINGLE interface to various databases (very little learning curve) while at the same time allowing the searcher to use the structure that the database might offer. As you know many of these databases offer controlled vocabularies to assist in brining like things together.

  8. Francis Hamit says:

    Library databases are not free. The libraries pay substantial annual subscription fees for them. This is your tax dollars at work. The original publishers receive between between 30 and 75% of these revenues and those from database services such as Nexis-Lexis and Factiva. The commercial price for individual articles runs from $2.95 each and up.

    Nor is the market limited to the United States. There are over a hundred nations where libraries have these same databases.

    If the same articles were available on Google, for “free”, these libraries would not have to offer this service, but that’s not likely to happen since the current market represents billions of dollars in revenue.

    While most of the millions of articles in these archives are legally licensed, some are not. This was what the Tasini case was about. There are hundreds of freelance writers such as myself whose work and copyrights are being infringed. Apparently there is no way short of a lawsuit to stop such abuse.

    With my own work, I do offer an alternative through online bookstores where dozens of my previously published articles are available either singlely or in thematic bundles. They can be found on, Elibron, Diesel e-books,Fictionwise and many affiliated sites.
    These do sell.

    The entire database industry is predicated on availability. Millions of articles are available, but very few are actually downloaded or read. I base this upon usage reports obtained from my local library. The cost per year is four cents for every person in the county or $1.65 for every article actually used.

    The idea of archives is wonderful. They are a great research tool. But, without getting into the entire copyright issue (see my blog if you want that)the idea that any of them are or will be free is simply not in agreement with the facts.
    It costs money to create articles and to format them for electronic distribution in any form. Like it or not, someone, somewhere, pays for them.

    The entire growth of public library online databses during the 1990s in the U.S.A.was heavily subsidized by the Federal Government which also paid for the computers installed in every library branch to lessen the “digital divide” between those who could afford to buy one and those who could not.