Are full RSS feeds now more trouble than they are worth?

Are full RSS feeds now more trouble than they are worth?

I wondered that last week as the umpteenth Google alert hit my email in-box with a link to another “blog” that had scraped the full content of my posts. Curious this time, I clicked through and found something interesting at the bottom of the post.

It was the same list of social media links that I’ve opted into appending to the bottom of my posts in my Feedburner RSS feed. Their inclusion confirmed to me something I’d long suspected, but shoved to the back of my mind, that scrapers are using the convenient XML formatting of RSS feeds to populate their spam webpages.

(Let’s continue down the stream of consciousness, shall we?) That prompted me to wonder how many actual human beings are reading my site via RSS feeds today, versus spam bots harvesting those feeds to steal my content for their websites. With the rise of Twitter and Facebook, those have become the go-to sources for me to push new posts to my readers. Does anyone still use RSS?

It’s tough to answer that by looking at Feedburner stats. Perhaps an OJR reader with this information might inform us in the comments, but I don’t know of a good way to parse that data to separate human readers from scraper bots.

But the presence of so many scraper sites on the Internet, even after Google’s much-hyped Panda update, inspires me to consider cutting off their source of content. What if I killed my RSS feeds? Would the scrapers leave me alone? Would Google and Bing still find my content? Would my readership suffer?

Sitemaps provide a superior way to use XML to alert search engines and legitimate aggregators to new posts and content on a website, so I don’t believe the loss of an RSS feed would hurt you there. As I mentioned before, Facebook and Twitter provide new, more popular avenues for pushing new URLs to your readers and fans. But without an RSS feed breaking down your site’s content into easy-to-parse XML, scrapers likely would have a harder time extracting readable content from your website to put on theirs.

One interesting fact about the way that scrapers mine RSS feeds: They take only the headline and content, never the link. So as an interim step before killing off my RSS feeds, I’ve tried modifying them instead. I’ve rewritten the script that generates my feed to add the following line to each post in the feed:

“The article originally appeared at HYPERLINKED_URL_HERE. If you are not reading this post on a personal RSS reader (such as Feedburner) or on HYPERLINKED_WEBSITE_NAME, you are reading on a “scraper” site that has illegally copied our content. Please visit HYPERLINKED_URL_HERE for the original version, which includes all the reader comments.”

This places the original URL, and links it, within the copy of each post. Not only should that help search engines to know the canonical URL when the piece is scraped, it should help drive some of the scraper sites’ traffic back to my website. Ultimately, I don’t care about scraper sites if they drive their traffic back to me. It’s just when they take my content without returning traffic that offends me.

I just made this switch, and I’ll report back if I see any change in traffic, search engine placement or scraper abuse of my websites, as a result. In the meantime, I’d like to hear what you’re doing (or not) with your RSS feeds to fight scraping.

Comments welcomed!

About Robert Niles

Robert Niles is the former editor of OJR, and no longer associated with the site. You may find him now at http://www.sensibletalk.com.

Comments

  1. I’ll be interested to hear the outcome of your test. I’ve been surprised that more sophistication hasn’t come to RSS. Seems like rather than fighting the scrapers, if you could have your full RSS feed incorporate ads (or cross promotion) in line, you have just expanded the reach of your site. Couple that with your solution and what’s so bad about having another site scrape content if it can include those items? What am I missing?

  2. 222.44.41.33 says:

    I love reading your posts via Google Reader – please don’t abandon RSS 🙂

  3. 82.71.57.27 says:

    Well, I read your article in Google reader – and shared it to a number of people who follow me….

  4. 152.52.254.94 says:

    RSS is primarily used were I work as a format for computer systems to read the content between servers and such.

    That computers read it makes it easy to scrape.

    Now, that said, all the computer savvy users I know that are also news junkies (a narrow niche I suspect) use RSS readers. That’s how I read OJR along with several other sites. Partial feeds would mean I’d probably read OJR less because there would have to be compelling content in the lead or I’m not clicking thru. But I understand that I’m just one reader, so it may be worth despite what I think ;).

  5. To clarify, I don’t control the decisions about RSS feeds on OJR, and given the make-up of our audience here, I wouldn’t suggest shutting down this RSS feed.

    But scrapers aren’t exactly hitting up OJR, either. (It that good, or bad? I don’t know….)

  6. 192.234.2.90 says:

    John Battelle had a couple of interesting posts about RSS in December. I didn’t read all the comments, but there was a forceful response in support of it.

    I would probably not look at OJR without the RSS feed. With a reader, I can scan posts from a wide range of sites. It would take too long to click through to all the sites I currently subscribe to.

    Link to the blog here: http://battellemedia.com/archives/2010/12/social_editors_and_super_nodes_-_an_appreciation_of_rss

  7. 98.193.165.66 says:

    Please keep the RSS. I love the above commenter’s idea of incorporating ads. I just got into RSS last year (can’t believe I didn’t six years ago) and would miss it terribly. Even with lists Twitter and Facebook are too cluttered and unwieldy; give me my Google Reader!

  8. 78.60.169.162 says:

    For serious reading I use RSS exclusively. Twitter and Facebook are “lossy” channels

  9. Mohsin Jalali, ACCA says:

    Please don’t remove the RSS full feeds — the Facebook and Twitter feeds are proprietary and non-standardised, while I can use your site’s XML feeds (RSS/Atom) in any of the readers I like (in my case it is Google Reader on the go and Feeddemon on the main machine). A feed reader allows me the option to tag and save interesting content, share it in many social networks — while FB and Twitter are just passive consumption.