Do-it-yourself copyright protection online

One of the frustrating annoyances for online journalists comes after you’ve published some great content, seen other websites link to it, made better-than-average income off it… then discovered it duped on someone else’s website, without your permission.

Copyright theft online isn’t just a problem for the music and software industries. Dupes of your content can hurt you not only in lost traffic and revenue… if you don’t take care to protect your content, you might even find the thieves’ versions ranking above your original content in search engine results.

That doesn’t happen often, but why risk letting thieves build a publishing history, and inbound links, with your content? Not when finding them is so easy.

The simplest way to check for duped content online is to plug your URLs into Copyscape. It’s a free search engine that takes the URL you supply it and does a nifty little content analysis to find duped pages on the Web. (If you want to pay a few bucks a month, they’ll check your pages for you, on a regular schedule, or let you construct automated searches via an API.)

I’ve used Copyscape to bust folks duping math tutorials I wrote a decade ago. Some academic colleagues have used TurnItIn, another service that checks for duped content online. TurnItIn is designed for use by teachers and professors, and aims to identify student work that’s been copied from the Web. Instead of starting with an original page and looking for dupes, TurnItIn takes a student paper then looks for similar work online.

Unlike Copyscape, TurnItIn doesn’t offer a free option, and requires a license to use. If you teach journalism, either as your full-time job or as a part-time gig, your school might have a license already, so it’s worth asking.

You can also use Google to track snippets of content from your website. Just find a unique phrase from a page, then search for it on Google, and see what turns up. This can help you find scrapers that are pulling excerpts from your site. If you have a handful of high-value webpages that you want to track against copying, for free, just set up Google Alerts for key phrases from those pages, and let Google inform you via e-mail when it finds other webpages that match them.

Let’s say you find some hits, either dupes of entire pages, or excerpts that take far more than could be considered fair use. What then?

The nicest response is to e-mail a note to the site, either using a contact form on the infringing website or a WHOIS search to find an address for the owner of the domain. Politely, but firmly, inform them of the violation and ask that they remove the content.

If you’re dealing with an eager reader or clueless novice publisher, this is by far the most effective approach and can provide what educators like to call a “teaching moment” about copyright law. Why bring out the legal guns against your fans? Just show ’em how to hyperlink to the content they want to show others.

But if you are dealing with a professional scraper, the folks who are building businesses on stolen content online, then you’ll likely need to skip to the next step — filing a copyright infringement notice. Google explains how to do this on its website. It’s a relatively simple cut-and-paste job to create the complain letter, which will be need to be faxed or snail-mailed.

You might also file infringement notices with the offending publisher and its Internet host. But if you’re not in the mood to do the sleuthing necessary to find the name and mailing address of the publisher’s host, or if the host is located outside the U.S., filing with Google, and other search engines, will do the trick. After all, if no one can find the offending website via search engines, it’s as good as gone from the Web anyway.

Even if you are among the publishers using Creative Commons to allow others to republish your content online, you might still wish to use Copyscape or other methods to ensure that the people who are republishing your content are doing so under the Creative Commons conditions you requested.

Finally, don’t overlook the importance of publishing your e-mail address or a contact form on your site. What does this have to do with copyright protection? First, making it easy for readers to contact you can help prevent copyright infringement, as readers who are interested in passing your content along to others can get in touch with you to ask permission beforehand. I’ve found that this is a great way to thank readers for their interest, while steering them away from simply duplicating my content.

Second, a contact form or e-mail allows readers a way to alert you to infringements that they find. I’ve had this happen to me, too. Several readers, over the years, have let me know about websites that were duplicating the articles I’d written. These readers were fans, and were as outraged about someone else profiting from my work, as I was.

So social networks online can work for you, even as there is a risk that the informal tone many readers perceive online leads some of those readers to rip off and dupe up your content. Make tech tools work for you, though, and you can help ensure that your content is going out on the Web in the way you want it, and not in ways that you do not.

About Robert Niles

Robert Niles is the former editor of OJR, and no longer associated with the site. You may find him now at


  1. says:

    Great stuff – but what tools (if any) do you recommend for publishers of multimedia content? Movie & TV studios have sophisticated sniffers that search for particular video signatures, the better to send out cease-and-desist letters to YouTube. It’s my understanding that such tools are fairly expensive to license and run – is there anything for the less well-heeled audio/video copyright holder?

  2. says:

    TurnItIn is a bit of a paradoxical program.

    I mean they say that they are anti-copying but essentially, their program exists because it scans the work of thousands of OTHER people’s essays; nothing in their database is their own. Using other people’s data to populate your entire service could just as easily be considered a form of plagiarism in itself.