The 4 parts of an optimized online news site

The Internet provided journalists a fresh opportunity to create new publishing tools and systems to better serve their audience and communities than what traditional print and broadcast methods had provided. But most news organizations failed to make substantial changes in their production process to take advantage of this opportunity.

Sure, many newspapers and a few broadcast stations played around at the edges of innovation. But over the years most of those innovators have left the newspaper and broadcast industries, and are now at work at start-ups or other online firms. That’s why I’ve pretty much given up trying to persuade current newspaper managers how better to publish online. So many of the ones I’ve met are more concerned with forcing their old pricing models and publishing practices onto the online market than serving the market as it now exists.

So, instead, I will direct my comments today at those who are leaving the newspaper industry, in the hope that they won’t make all the same mistakes their former colleagues did.

Don’t mistake the current practices of your publishing medium for the best practices of journalism. Don’t limit what you can do to what you have done. Ultimately, those are the reasons why I urged journalists and educators two weeks ago to shift their focus from AP Style to search engine optimization. It’s not that AP Style’s a bad thing for aspiring journalists to learn. Far from it, AP Style continues to offer some excellent advice on writing, as well as a connection with the rich heritage of print journalism. But learning SEO is essential to building an audience in today’s competitive online publishing market, more necessary for students than learning AP Style.

But on-page SEO is just a small part of what online publishers must do to fully optimize their websites to attract, retain and expand an audience. Site-wide online optimization prepares your website to offer the information potential audience members seek, within the context of a community in which they’ll feel welcomed, empowered and rewarded for participating… even if only as a reader.

Unfortunately, too many online publishers, new and old, view their websites only within the context of what their chosen content management system [CMS] can provide. That’s as bad a mistake as a 1990s newspaper publisher looking at his or her website as an online edition of the newspaper.

If you’re going to take full advantage of your opportunity as an online publisher, you must always look beyond the capabilities of your current CMS, focusing instead on what you see as the information and communication needs of your audience. In doing that, you must look first toward how your target audience is getting and using information online, and then envision a publishing environment that builds on their current behavior to maximize participation on your site.

What should a modern online news site include? Here are the four core components that I try to build into my websites, based on my experience with the audiences I’ve pursued online over the past 15 years, both on newspaper websites and in niche topic communities.

The Knowledge Base
News publications contain immense archives of information, but, thanks to the conventions of daily publishing in print, that information is scattered among thousands of incremental, daily articles. (I explored this problem four years ago in Search and you will find… an old news story?) That dispersion of information puts news sites at a huge disadvantage in attracting new readers, who so often instead end up at sites such as Wikipedia, which organize their information into single-topic pages, containing all relevant information about those topics.

There’s no good reason why news websites can’t have rich collections of articles about the topics of greatest interest to their communities. I believe that journalists ought to be the ideal candidates to write such pieces – they should know the community and the beat they cover within it. Unfortunately, most attempts to date by newspaper websites to create topic pages have relied on automated solutions.

An optimized news knowledge base must be written with human hands, by a knowledgeable writer (or newsroom) that can craft no more than 20 sharp graphs on a topic folks are searching for, with appropriate hyperlinks to information elsewhere on the site.

This knowledge base becomes the SEO bait that attracts new readers into the website, and delivers them the rich, rewarding information that helps keep them there.

Expert Voices
A Knowledge Base without updates is… an encyclopedia. That’s nice; it’s valuable, but it’s not a news site. Those need updates, written by knowledgeable reporters with the ability to distinguish the true from the false, the honest from the fraudulent, and the accurate from the incomplete. And to make those distinctions explicit for their readers.

It’s not enough to play stenographer any longer. Neither sources nor readers need journalists to do that. Communities instead need people who can cut through all that information accessible through Google, or posted to Facebook, and show them what’s true, what’s honest and what’s complete.

And, given the conventions of online publishing today, readers want to see the names and faces of the individuals who are making those cases. The Internet is a powerfully personal medium, defined by individual interaction. It is a mass medium of individuals in relation with one another, unlike print and broadcast media. In print, the byline was subservient to the masthead, and to the institutional voice. Online, readers expect to see the name and hear the voice of the author they read (even if those names are sometimes pseudonymous).

The blog provides the best format yet implemented for connecting expert voices with an audience. An optimized news website would provide a collection of expert voices, presented in blog format, with daily (or more frequent) updates to complement the basic information presented in the site’s Knowledge Base.

Readers’ Voices
As I just wrote, the Internet is an interactive medium, and a news website must function as an online community. Readers ought to have the opportunity to engage your publication’s expert voices, as well as to initiate coverage and conversation, through their own blogs and discussion forums.

Your community should be a meritocracy, though, which values the true, honest and complete among its participants as much as it does among the other sources that your expert voices cover. As a publisher, you are under no obligation to provide everyone an equal voice. In fact, you have an obligation to create a community in which participants can distinguish the valuable posts.

Legacy Media Archive
The final core component is the one that makes up the majority of most legacy news sites. Unfortunately, it’s the least engaging and least valuable in attraction the new readers that drive your publishing business’s customer growth. Still, I believe that placing a complete legacy media archive online is important, to allow readers searching for individual articles to find them, and for others to reference your legacy media work in their online conversations. Especially important landmark stories also should figure prominently in relevant Knowledge Base articles.

Such an optimized site offers sharp, focused topical articles to draw in new readers, knowledgeable voices to keep them coming back, community evolvement to empower them to make your site their online home and to promote it to others, as well as all the information that you’ve published elsewhere in the past, for readers’ reference. It makes full use of the capabilities of today’s Internet, in the way that readers are now using it.

Your decision about a publishing system – which one to select, to customize or to develop – should look forward to such an optimized site, instead of looking backward to providing a workflow that you might be comfortable with already, or an experience similar to popular sites you frequent. Looking backward too often locks you into suboptimal publishing techniques and formats that won’t allow you to distinguish your site from competitors’.

You might have noticed that I haven’t offered any suggestions how to blend these four core components. That’s up to you. An optimal website is better than the competition, not one that matches or duplicates it. All I hope to do here is to inspire you to think about how might better optimize your publication to reach the growing, thriving audience that your publishing business needs.

15 criteria for picking a content management system for an ad-driven hyperlocal news website

One of the biggest early decisions a hyperlocal site entrepreneur makes is what Content Management System [CMS] they will use. One can think about this similar to picking a spouse. You are going to live with the decision day and night for a long, long time. Also, similar to choosing a spouse, each person has different criteria. I will share the criteria I used for my hyperlocal site (www.sunvalleyonline.com) so that you can consider them and prioritize them based upon your needs. Think through these criteria or your “spousal” choice may leave you feeling like Michael Douglas and Kathleen Turner in The War of the Roses.

Before I get into that, I will share my experience and scenario which gives you some perspective on my situation. I’m a tech industry veteran (~25 years) though my hands-on coding experience is ancient (~20 years ago) but as a non-technical person goes, I’m reasonably technical though I’ve been on the business and editorial side of Web properties the last 15 years.

Part of my background includes being part of the early team of Microsoft Sidewalk starting in 1995 where I ran a team that supported the cities, as well as about half the cities reported through me, so I’ve been working with CMSs in the local arena for nearly 15 years. SunValleyOnline (SVO) has been around for about 5 years and was built on a proprietary platform that hasn’t changed in years. We are in the final stages of the transition from the old to the new site. SVO has been self-sustaining for a couple years with a small team of three people. We rely on a mix of community and staff contributions. I have personally blogged for several years and have used blogs built on Blogger and mostly WordPress.

To jump ahead, there’s lots of merit in WordPress and the ecosystem built around it, however I felt it came up short on the criteria I established to make the decision.

Listed below are the criteria I used with a brief explanation. While everyone will have somewhat different criteria, I listed the items in priority order from most to least important based upon my experience and priorities.

  1. No developer required: In my opinion, it is no longer necessary for 98 percent of sites to have a Web developer on staff. Fortunately, there are many off-the-shelf solutions that don’t require an in-house technologist. There may be occasional needs where a developer can be contracted to do specific work but at the early stages of a site’s development, I think a site should be focused on other items rather than doing custom development. As long as your CMS has the ability to extend it later, you can defer bringing on a technologist and save yourself money. Of course, there are hyperlocal sites founded by people with technology skills, and they can certainly take advantage of that, but it’s not a requirement to get off the ground.
  2. Easy to monetize: This relates to the next point (“Open”). Most sites are limited to generating revenue using standard display ads. While that is the right place to start, this is a highly dynamic sector and thus it should be easy to extend your site with various other capabilities whether it is turning standard display ads into video ads or incorporating high-quality ad networks, it should be as easy as “copy and paste” to add these capabilities to your site.
  3. Open: It should be very easy to add and delete modules to a page or an entire site, such as social media features, inbound RSS feeds (i.e., pulling in a news feed from another site), and widgets of all types from weather to flickr slideshows to polls to various monetizable elements from any number of third parties.
  4. Community Generated Content: It should be very easy for members of your community to contribute articles, pictures, video, classifieds, reviews, etc. The CMS should give you the ability to determine whether a specific user is able to post directly to the site or whether the contribution should go into a publication queue for review/approval. It should also allow your community to send in articles via an e-mail interface. Among other things, this can allow them to e-mail pictures and video from their smartphones, which can be critical when there are breaking news events in your community. The CMS we picked has nailed this part. It gives someone who might be witnessing a breaking story the opportunity to submit stories to the site, including pictures (and mapping those pics). What’s more, once the article is posted, you can update it via e-mail replies from the e-mail confirmation the CMS sends when the article posts. This may be the coolest single feature the platform we chose provides.
  5. Off the shelf cross-promotion: It must be easy to add features that help internal site promotion. Having features sprinkled through as site such as Most Viewed Pages, Recent Comments, Highly Rated articles and so on are very helpful at increasing the time people spend exploring your site.
  6. Outbound RSS: Mentioned earlier was inbound RSS. Just as you can and should pull in RSS feeds from complementary sites, you should make various RSS feeds available so that others can pull in your content to their pages. A CMS should automatically create a range of RSS feeds (e.g., Top Headlines, department and author specific feeds, etc.).
  7. Design templates and flexibility: CMSs usually come with pre-built templates, as well as the ability to customize the look and feel. If you don’t like the pre-built templates you can preview, ensure that the process to change the site design is straightforward. [Side note: I have, unfortunately, heard of designers charging sites $5,000 for a WordPress template when a few hundred dollars should get you a solid design.]
  8. Pictures and video: Not only should it be easy to embed code that pulls in photos and video from sites such as flickr and YouTube, the platform should allow you and your community contributors to upload directly to your site. Having users be able to rate photos and videos is another way to increase engagement with your community, which is vital for your success.
  9. Integration with Social Media: Your CMS should enable you to easily integrate with Facebook (and Facebook Connect) as well as Twitter. This includes enabling you to automatically post items to your accounts on the Social Networks including shortening URLs (e.g., using a tool such as bit.ly). Also throughout your site, it should be easy for users to send your articles, photos, etc. to the major social tools (Digg, StumbleUpon). Don’t forget e-mail – still the most popular way to share an article. “Send to a Friend” should be baked into the system.
  10. Analytics: Not only should it be easy to add third-party tracking tools such as Google Analytics and Quantcast to a site, there should also be the ability to measure success and reward contributors based upon how well read one’s contributions are.
  11. Events: A community-powered Events Calendar is a great way to connect with the community. Not only should a CMS have this capability, it should allow your community to easily submit events. The system should allow for plotting of the events on a map and have the basics of an Events Calendar such as support for recurring (i.e., multi-day) events.
  12. Classifieds: While Craigslist has made it to many communities, it doesn’t work well today for hyperlocal. If you are only interested in garage sales in your immediate neighborhood, for instance, Craigslist can be unwieldy. Thus, there is an opportunity to fill a niche where the big boys aren’t servicing your community very well. Naturally, having features you expect in articles (maps, photos, etc.) is important for classifieds as well.
  13. Maps: The importance of maps/location continues to increase with the popularity of smartphones. A smart CMS will be able to recognize a photo or Tweet having a GPS coordinate appended to it. This gives your community another way to navigate your content (i.e., location) and becomes more important as mobile consumption increases.
  14. Mobile: Another item that I expect to rapidly grow in importance is mobile. A CMS that allows for your site to be easily consumed on various mobile platforms will be a big asset. At the moment, mobile requires a lot of custom development but this should change in the relatively near future.
  15. Search Engine Dashboard: Not a common feature yet but one we expect to become more common. Sites such as the Huffington Post are very sophisticated in analyzing search trends to drive headline selection, tagging and how visibility of articles is raised or lowered based upon search term frequency.

At the risk of this sounding like a sales pitch for the platform we chose, I was very impressed with the flexibility and extensibility of the Neighborlogs platform we chose. It met nearly all the criteria listed above. Progressively, I’m learning the platform more and more and finding more slick things it can do. If I had to summarize why it’s a great fit, it is the fact it is purpose-built for the hyperlocal space whereas WordPress, Drupal, Django and other options I consider are great general-purpose systems but not geared towards hyperlocal specifically. Like WordPress and the others, you can’t beat the price (free). They currently only charge a revenue share on the self-serve ads that are purchased through that tool (no split on the ads you bring to the table).

To provide a bit of balance, let me share some areas of constructive criticism for Neighborlogs. The platform developers are running their own hyperlocal site and local network and are very busy. They aren’t always quick to respond, though it’s certainly better than WordPress where you just have a developer community and no dedicated team to support you unless you hire your own team. There are a few items that are not perfect in how they pull in RSS feeds and the accompanying social media features. Their ad system isn’t as robust as some of the ad servers out there, but the shortcomings weren’t deal breakers for us. Being a relatively new company and platform, there’s always the risk that they don’t survive, but, as good of a job as they have done, I think others will discover the benefits themselves.

Overall, I’d encourage people to clearly define their own criteria. My criteria aren’t applicable to everyone. Establishing your own will greatly increase the chances you’ll be happy long term. I encourage others to share their experiences, good or bad, with various CMSs they have used. I also welcome feedback on our new site. What works for you and what doesn’t?

'What is Robots.txt?'

Every Web publisher ought to be thinking about how to improve the traffic that they get from search engines. Even the most strident “I’m trying to appeal only to people in my local community” publishers should recognize that some people within their community, as is the case in any community, are using search engines to find local content.

Which brings us to this week’s reader question. Actually, it isn’t from a reader, but from a fellow participant in last week’s NewsTools 2008 conference. He asked the question during the session with Google News’ Daniel Meredith, and I thought it worth discussing on OJR, because I saw a lot of heads nodding in the room as he asked it.

Meredith had mentioned robots.txt as a solution to help publishers control what content on their websites that Google’s indexing spiders would see. A hand shot up.

“What is robots-dot-text?”

Meredith gave a quick and accurate answer, but I’m going to go a little more in depth, for the benefit of not-so-tech-savvy online journalists who want the hard work on their websites to get the best possible position in search engine results.

Note that I wrote “the best possible position,” and not “the top position.” There’s a difference, and I will get to that in a moment.

First, robots.txt is simply a plain-text file that a Web publisher should put in the root directory of their website. (E.g. http://www.www.ojr.org/robots.txt. It’s there; feel free to take a look.) The text files includes instructions that tell indexing spiders, or “robots,” what content and directories on that website they may, or may not, look at.

Here’s an example of a robots.txt file:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /*.doc$
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /ads

This file tells the “Mediapartners-Google” spider that it can look at anything on the website. (That’s the spider that Google uses to assist in the serving of AdSense ads.) Then, it tells other spiders that they should not look at any Microsoft Word documents, GIF or JPGs images, or anything in the “ads” directory on the website. The asterisk, or *, is a “wild card” that means “any value.”

Let’s say a search engine spider finds an image file in a story that’s it is looking at one your website. The image file is located on your server at /news/local/images/mugshot.jpg, that is, it is a file called mugshot.jpg, located within the images directory within the local directory within the news directory on your Web server.

Your robots.txt file told the spider not to look at any files that match the pattern /*.jpg. This file is /news/local/images/mugshot.jpg, so it matches that pattern (the asterisk * taking the place of news/local/images/mugshot). So the spider will ignore this, and any other .jpg file it finds on your website.

So why is this important to an online journalist? Remember that Meredith said Google penalizes websites for duplicate content. If you want to protect your position in Google’s search engine results and in Google News, you want to search engine spiders to focus on content that is unique to your website, and ignore stuff that isn’t.

So, for example, you might want to configure your robots.txt so it ignores all AP and other wire stories on your website. The easiest way to do this is to configure your content management system to route all wire stories into a Web directory called “wire.” Then put the following lines into your robots.txt file:

User-agent: *
Disallow: /wire

Boom. Duplicate content problem for wire stories solved. Now this does mean that Web searchers will no longer be able to find wire stories on your website through search engines. But many local publishers would be that result as a feature, not a bug. I’ve heard many newspaper publishers argue that coming to their sites from search engine links to wire content do not convert for site advertisers and simply hog site bandwidth.

If you are using a spider to index your website for an internal search engine, though, you will need to allow that spider to see the wire content, if you want it included in your site search. If that’s the case, add these lines above the previous ones in your robots.txt:

User-agent: name-of-your-spider
Allow: /wire

Or, use

User-agent: name-of-your-spider
Allow: *

… if you wish it to see and index all of the content on your site.

Sometimes, you do not want to be in the top position in the search engine results, or even in those results at all. On OJR, we use robots.txt to keep robots from indexing images, as well as a few directories where we store duplicate content on the site.

Other publishers might effectively use robots.txt to exclude premium content directories, files stores on Web servers that aren’t meant for public use, or files that you do not wish to be viewed by Web visitors except those who find or follow the file from within another page on your website.

Unfortunately, many rogue spiders roam the Internet, ignoring robots.txt and scraping content from sites without pause. Robots.txt won’t stop those rogues, but most Web servers can be configure to ignore requests from selected IP addresses. Find the IPs of those spiders, and you can block them from your site. But that’s a topic for another day.

There’s no good reason to lament search engines finding and indexing content that you don’t want anyone other that your existing site visitors or other selected individuals to see. Nor do you have to suffer duplicate content penalties because you run a wire feed on your site. A thoughtful robots.txt strategy can help Web publishers optimize their search engine optimization efforts.

Want more information on creating or fine-tuning a robots.txt file? There’s a good FAQ [answers to frequently asked questions] on robots.txt at http://www.robotstxt.org/faq.html.

Got a question for the online journalism experts at OJR? E-mail it to OJR’s editor, Robert Niles, via ojr(at)www.ojr.org