Do you still read newspapers?

The circulation data is clear: Fewer people are taking the daily newspaper in the United States. Readers and, increasingly, advertisers are moving online.

As online journalists, many of us straddle both worlds. Many of us work for newspaper-dot-coms; others at least started their careers in print.

Are any of us still reading the “dead tree edition?” If so, how many newspapers a day are you reading? And how many did you read a decade ago?

Journalists, one might presume, ought to be the biggest fans and consumers of journalism. Can online journalists, folks at leading edge of industry change, still be counted on to take the print edition? Or have we bailed on print, too?

Tell us in the comments which papers you still read in print, and which you would recommend. Or, if you are not reading papers in print, tell us what might help you change your mind and subscribe to a print newspaper in the future.

'What is Robots.txt?'

Every Web publisher ought to be thinking about how to improve the traffic that they get from search engines. Even the most strident “I’m trying to appeal only to people in my local community” publishers should recognize that some people within their community, as is the case in any community, are using search engines to find local content.

Which brings us to this week’s reader question. Actually, it isn’t from a reader, but from a fellow participant in last week’s NewsTools 2008 conference. He asked the question during the session with Google News’ Daniel Meredith, and I thought it worth discussing on OJR, because I saw a lot of heads nodding in the room as he asked it.

Meredith had mentioned robots.txt as a solution to help publishers control what content on their websites that Google’s indexing spiders would see. A hand shot up.

“What is robots-dot-text?”

Meredith gave a quick and accurate answer, but I’m going to go a little more in depth, for the benefit of not-so-tech-savvy online journalists who want the hard work on their websites to get the best possible position in search engine results.

Note that I wrote “the best possible position,” and not “the top position.” There’s a difference, and I will get to that in a moment.

First, robots.txt is simply a plain-text file that a Web publisher should put in the root directory of their website. (E.g. http://www.www.ojr.org/robots.txt. It’s there; feel free to take a look.) The text files includes instructions that tell indexing spiders, or “robots,” what content and directories on that website they may, or may not, look at.

Here’s an example of a robots.txt file:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /*.doc$
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /ads

This file tells the “Mediapartners-Google” spider that it can look at anything on the website. (That’s the spider that Google uses to assist in the serving of AdSense ads.) Then, it tells other spiders that they should not look at any Microsoft Word documents, GIF or JPGs images, or anything in the “ads” directory on the website. The asterisk, or *, is a “wild card” that means “any value.”

Let’s say a search engine spider finds an image file in a story that’s it is looking at one your website. The image file is located on your server at /news/local/images/mugshot.jpg, that is, it is a file called mugshot.jpg, located within the images directory within the local directory within the news directory on your Web server.

Your robots.txt file told the spider not to look at any files that match the pattern /*.jpg. This file is /news/local/images/mugshot.jpg, so it matches that pattern (the asterisk * taking the place of news/local/images/mugshot). So the spider will ignore this, and any other .jpg file it finds on your website.

So why is this important to an online journalist? Remember that Meredith said Google penalizes websites for duplicate content. If you want to protect your position in Google’s search engine results and in Google News, you want to search engine spiders to focus on content that is unique to your website, and ignore stuff that isn’t.

So, for example, you might want to configure your robots.txt so it ignores all AP and other wire stories on your website. The easiest way to do this is to configure your content management system to route all wire stories into a Web directory called “wire.” Then put the following lines into your robots.txt file:

User-agent: *
Disallow: /wire

Boom. Duplicate content problem for wire stories solved. Now this does mean that Web searchers will no longer be able to find wire stories on your website through search engines. But many local publishers would be that result as a feature, not a bug. I’ve heard many newspaper publishers argue that coming to their sites from search engine links to wire content do not convert for site advertisers and simply hog site bandwidth.

If you are using a spider to index your website for an internal search engine, though, you will need to allow that spider to see the wire content, if you want it included in your site search. If that’s the case, add these lines above the previous ones in your robots.txt:

User-agent: name-of-your-spider
Allow: /wire

Or, use

User-agent: name-of-your-spider
Allow: *

… if you wish it to see and index all of the content on your site.

Sometimes, you do not want to be in the top position in the search engine results, or even in those results at all. On OJR, we use robots.txt to keep robots from indexing images, as well as a few directories where we store duplicate content on the site.

Other publishers might effectively use robots.txt to exclude premium content directories, files stores on Web servers that aren’t meant for public use, or files that you do not wish to be viewed by Web visitors except those who find or follow the file from within another page on your website.

Unfortunately, many rogue spiders roam the Internet, ignoring robots.txt and scraping content from sites without pause. Robots.txt won’t stop those rogues, but most Web servers can be configure to ignore requests from selected IP addresses. Find the IPs of those spiders, and you can block them from your site. But that’s a topic for another day.

There’s no good reason to lament search engines finding and indexing content that you don’t want anyone other that your existing site visitors or other selected individuals to see. Nor do you have to suffer duplicate content penalties because you run a wire feed on your site. A thoughtful robots.txt strategy can help Web publishers optimize their search engine optimization efforts.

Want more information on creating or fine-tuning a robots.txt file? There’s a good FAQ [answers to frequently asked questions] on robots.txt at http://www.robotstxt.org/faq.html.

Got a question for the online journalism experts at OJR? E-mail it to OJR’s editor, Robert Niles, via ojr(at)www.ojr.org

Booted for blogging, ex-Washington Post staffer reacts

The Drunk Blogger? Not really. More appropriately, a professional newsman on staff at one of the most reputable rags in the field. But Michael Tunison’s secret writing life with the witty—if not a bit profane—NFL blog, Kissing Suzy Kolber, got him booted from his MSM gig.

Last month Tunsion—aka Christmas Ape—came out of Internet anonymity with a KSK entry documenting his inebriation one ancient evening at (gasp) a sports bar. Turns out that was the Washington Post’s cue to fire him, within 48 hours of the post, for “discrediting the publication.”

The Web backlash to WaPo’s knee-jerk reaction was immediate and expected. For HR malpractice. For stodgy new-media ignorance. For axing a potential traffic cow.

But don’t quit your day job, Mike. KSK is of course booming on the heels of the incident, and Tunison is content, sort of, to be uncaged in that space.

We caught up with him over e-mail for a closer look at the whole mess.

OJR: Is there anything defensible about this? Or does a part of you think WaPo did what it had to do?

MT: I think The Post has a right to uphold and enforce whatever stodgy standards of conduct that it deems appropriate. I don’t they would have acted as extremely or as quickly as they did if it wasn’t first picked up by a journalism blog. In that case, the editors probably felt pressure from within the journalism community to cleanse whatever damage they thought I was doing to the Post brand.

OJR: Sounds like it was technically over your post about being drunk at a bar, but that seems a little far-fetched. There’s got to be more to it than that. They say you “discredited” the publication. But what was actually said to you. Anything verbal, or did it all come in memos?

MT: Far-fetched though it may seem, that’s what they said. The day after I put up the outing post, I got a call from the top editor of the Metro section, who was already making clear I was in deep shit and was probably going to be fired. He essentially wanted my reasons for doing so to run by personnel. The next day, I was called back into his office where he laid out the terms of my dismissal. He said the drunk picture coupled with the language while linking to my Post stories violated the paper’s standards.

OJR: Seems to me they would have been a bit better off to give you a slap on the wrist and leverage you for site traffic. Are you at all surprised they couldn’t see it that way?

MT: I figured the penalty would be less severe and there would be more room for discussion. I’m not surprised at all that they couldn’t find something for me to do with The Post’s Web operation. There’s a stunning lack of vision at The Washington Post when it comes to Web-exclusive content. Not to mention that the disconnect between The Post and its website is astounding. The Washington CityPaper did a great piece on that a few months ago. Look at Dan Steinberg’s D.C. Sports Bog. It’s probably the best executed sports blog by a mainstream publication and it’s barely promoted at all by the organization. Sure, one post makes it to page 2 of sports section in the print paper, but log onto The Post’s site and you’d never know it existed. You have to really dig through that unwieldy thing to find it.

OJR: Surely you had to be expecting a knee-jerk reaction of some sort. To what extent did you think it would be feasible for your two writing lives to coexist?

MT: I thought so. As I’ve said on the site, there was no overlap at all between what I did for the paper and the writing at KSK. I also made pains on the revealing post to not actually write out my name and the publication. You could only find those things by visiting The Post and clicking through the links. A Google search of my name or The Washington Post wouldn’t have brought it up, so no one would have discovered it except readers of Kissing Suzy Kolber. Now, readers of KSK and WaPo readers aren’t mutually exclusive, but you can be damn sure KSK readers didn’t think my employment there hurt the paper in any way.

OJR: It sucks to lose the 9 to 5, but how bitter are you, really, considering you come off as the good guy in all this?

MT: I’m a little bitter because I was never really given an opportunity to excel at The Post and as soon as I develop something for myself that garners some success, they find out about it and can me. When I’m doing uninteresting work, I’m going to need a creative outlet on the side.

OJR: How, if at all, are you pursuing other newspaper jobs? Or are you done with MSM? If so, why?

MT: I’m not going after any newspaper jobs at the moment. Partly because I don’t want to but also because they wouldn’t hire me even if I did. Just this past week, the guy who runs The Sporting News’ blog, The Sporting Blog, wanted to bring me on to do some work with them and he was shot down by higher-ups. The reason: because I’m too “controversial” after this firing. I’m sure I’m blackballed from a number of places, probably forever. It’s a little pathetic, really. The mainstream journalism community is so insular and at the same time so terrified. The situation is just going to get worse for them until they reevaluate more than just staff sizes. I have other aspirations, but I’m happy with blogging for now. I make about as much as I did at The Post, which wasn’t much, with writing for a few blogs. I can be happy with that for a bit.

OJR: How has your role on KSK changed through all this? Obviously you have more time to put toward it, but do you feel at all uncaged or liberated in terms of your content?

MT: KSK has never really been a place where I’ve felt limited in terms of what I can say, so the firing doesn’t change much. I have more time and am writing a little more, but it’s still the off-season and there’s only so much to write about. Before coming forward, I had to be more guarded with personal information, which I don’t anymore.

OJR: This is the best PR imaginable for KSK. How has site traffic looked since the coming-out party? Are you guys looking to expand the site out of this?

MT: There was a big initial burst of traffic right after the outing. We had 108,000 unique visitors the day after I got fired. We average around 22,000 or so per day. It’s still been a little higher since than it was before the incident. We probably gained a few readers, but most of the other people were there because it was in the news. As far as expanding, the firing coincided with moving the site to a new address after reaching a contract with a nascent blog network. There are big plans for that network. As far as KSK, there are things we’re planning on adding here and there, like a liveblog of a game every week during the season. Other than that, we’re just keeping with what’s worked for us.