Do-it-yourself copyright protection online

One of the frustrating annoyances for online journalists comes after you’ve published some great content, seen other websites link to it, made better-than-average income off it… then discovered it duped on someone else’s website, without your permission.

Copyright theft online isn’t just a problem for the music and software industries. Dupes of your content can hurt you not only in lost traffic and revenue… if you don’t take care to protect your content, you might even find the thieves’ versions ranking above your original content in search engine results.

That doesn’t happen often, but why risk letting thieves build a publishing history, and inbound links, with your content? Not when finding them is so easy.

The simplest way to check for duped content online is to plug your URLs into Copyscape. It’s a free search engine that takes the URL you supply it and does a nifty little content analysis to find duped pages on the Web. (If you want to pay a few bucks a month, they’ll check your pages for you, on a regular schedule, or let you construct automated searches via an API.)

I’ve used Copyscape to bust folks duping math tutorials I wrote a decade ago. Some academic colleagues have used TurnItIn, another service that checks for duped content online. TurnItIn is designed for use by teachers and professors, and aims to identify student work that’s been copied from the Web. Instead of starting with an original page and looking for dupes, TurnItIn takes a student paper then looks for similar work online.

Unlike Copyscape, TurnItIn doesn’t offer a free option, and requires a license to use. If you teach journalism, either as your full-time job or as a part-time gig, your school might have a license already, so it’s worth asking.

You can also use Google to track snippets of content from your website. Just find a unique phrase from a page, then search for it on Google, and see what turns up. This can help you find scrapers that are pulling excerpts from your site. If you have a handful of high-value webpages that you want to track against copying, for free, just set up Google Alerts for key phrases from those pages, and let Google inform you via e-mail when it finds other webpages that match them.

Let’s say you find some hits, either dupes of entire pages, or excerpts that take far more than could be considered fair use. What then?

The nicest response is to e-mail a note to the site, either using a contact form on the infringing website or a WHOIS search to find an address for the owner of the domain. Politely, but firmly, inform them of the violation and ask that they remove the content.

If you’re dealing with an eager reader or clueless novice publisher, this is by far the most effective approach and can provide what educators like to call a “teaching moment” about copyright law. Why bring out the legal guns against your fans? Just show ’em how to hyperlink to the content they want to show others.

But if you are dealing with a professional scraper, the folks who are building businesses on stolen content online, then you’ll likely need to skip to the next step — filing a copyright infringement notice. Google explains how to do this on its website. It’s a relatively simple cut-and-paste job to create the complain letter, which will be need to be faxed or snail-mailed.

You might also file infringement notices with the offending publisher and its Internet host. But if you’re not in the mood to do the sleuthing necessary to find the name and mailing address of the publisher’s host, or if the host is located outside the U.S., filing with Google, and other search engines, will do the trick. After all, if no one can find the offending website via search engines, it’s as good as gone from the Web anyway.

Even if you are among the publishers using Creative Commons to allow others to republish your content online, you might still wish to use Copyscape or other methods to ensure that the people who are republishing your content are doing so under the Creative Commons conditions you requested.

Finally, don’t overlook the importance of publishing your e-mail address or a contact form on your site. What does this have to do with copyright protection? First, making it easy for readers to contact you can help prevent copyright infringement, as readers who are interested in passing your content along to others can get in touch with you to ask permission beforehand. I’ve found that this is a great way to thank readers for their interest, while steering them away from simply duplicating my content.

Second, a contact form or e-mail allows readers a way to alert you to infringements that they find. I’ve had this happen to me, too. Several readers, over the years, have let me know about websites that were duplicating the articles I’d written. These readers were fans, and were as outraged about someone else profiting from my work, as I was.

So social networks online can work for you, even as there is a risk that the informal tone many readers perceive online leads some of those readers to rip off and dupe up your content. Make tech tools work for you, though, and you can help ensure that your content is going out on the Web in the way you want it, and not in ways that you do not.

Legal and business advice for online publishers and bloggers

Over the months that I’ve been writing about legal issues for OJR, the consistent issue that has emerged is that online publishers need good legal representation. But that imperative has been matched by an equally vexing question: how does a small publisher get the right legal advice at an affordable price? Fortunately, there’s a host of good resources available, and some fairly clear guidelines on when legal advice is needed. Here’s what I learned from talking to the experts and scouring the Web.

Consider your legal exposure when choosing a structure for your business. Mark Anderson, an intellectual property attorney at Masur & Associates, says that, “Especially in terms of copyright infringement claims, damages can be very high, and if you’re not insulated by a corporate entity… then, your personal assets are potentially at stake. If somebody sues you for something that you wrote on your website, they’d be suing you personally, then you could lose your house; you could lose your car. But if you’ve got a business set up, that’s separate from you, it’s the business that would be sued, and the most you could lose from that is what you put into the business.”

According to Anderson, many small publishers find that a limited liability company, or LLC, provides the right combination of tax and legal advantages. Because an LLC is a corporation, its assets and liabilities are separate from those of its principals. However, some corporate structures have a disadvantage, because both the corporation and the individuals deriving income from them pay taxes. Owners of LLCs, along with S Corporations, can avoid this double taxation when their revenues are small, but they can change the way they are taxed if they start to make more money. LLCs have other additional advantages – for example, the ownership rules are more flexible.

An ethics policy or code of conduct may help protect you from libel or defamation charges. Ethics codes have their own virtues, and they don’t protect a publisher from legal action by themselves, but they can help to set the tone for an online community and clarify the publishers’ intent.

The debate over codes of conduct has become more intense because of the recent controversy surrounding threatening comments and pictures posted about prominent technology blogger Kathy Sierra. Sierra told readers that safety concerns led her to cancel speaking engagements and hide out in her home, awaiting the results of a police investigation. What followed was a vigorous, ongoing debate including efforts to create a bloggers’ code of conduct. [Full disclosure: I am a contributing editor for BlogHer one of the groups that figures prominently in both the Sierra controversy and the debate over blogging guidelines. BlogHer’s community guidelines inspired a proposed code of conduct proposed by well-known web writer Tim O’Reilly. Both codes pledge that online publishers will ban “unacceptable content” — content that might be libelous, abusive, or that might infringe on a copyright or trademark.

Anderson says it’s “tough to say” how a bloggers’ code of conduct might affect a legal proceeding. “There are certain protections under the law for journalists, and now it’s getting tougher and tougher to define who, exactly is a journalist. Potentially, adhering to one of these codes might be a factor that weighs in favor of somebody being treated as a journalist under certain laws.”

Small publishers doing journalism have to think carefully about the risks they are willing to take, especially since the legal definition of a journalist is subject to debate. Of course, freelancers and small publishers who commit acts of journalism have to understand that courts may not be willing, for example, to extend state shield laws protections to them. It’s also important to understand that federal prosecutors have broad subpoena powers when it comes to forcing the disclosure of information they deem important for a criminal investigation.

Nothing better illustrates the risks small publishers take than the case of videoblogger Josh Wolf, who was released from federal prison in early April after serving 8 months for refusing to turn over video outtakes from a July 2005 demonstration to a grand jury. Wolf claimed that, as a journalist, he was entitled to withhold the information under California’s shield law. However, the court rejected his claim because Wolf was not employed by a news organization at the time that he shot the video.

Be clear about your purpose. It’s because of Wolf and other citizen-journalists that Christine Tatum, president of the Society of Professional Journalists, thinks that the definition of a journalist should be expanded beyond those who are paid to report the news. “We want to define journalists as people who are gathering information with the purposes of distributing it,” Tatum says. “Rather than question for me being, ‘was that person a journalist?’ the question for me is, ‘was that person practicing journalism?'”

That view of journalists was part of the reason SPJ donated $31,000 to Wolf’s legal defense and helped him obtain the services of top-notch legal counsel. But Tatum acknowledges that the law has not embraced that definition, and neither do many bloggers. Noting that many bloggers say they aren’t journalists but want the legal protections afforded to journalists, she said, “I encourage people to really take a long and hard look at what is it you are, really?”

Take advantage of the growing number of educational resources and training opportunities made available by advocacy groups and professional organizations. Small business attorney Nina Kaufman notes that the Electronic Frontiers Foundation has a plethora of free resources, including legal guides for bloggers. The Media Bloggers Association is just one of several organizations that offers training in journalistic practices and legal issues. They have also taken the lead in advocating for press credentials for its members, most notably in the recent trial of Lewis “Scooter” Libby.

The MBA’s success echoes Anderson’s argument that, “the more professionally you run your blog site, the more you act like a traditional journalist, the more you are going to be treated as a real journalist. That would include adhering to a code of ethics.”

Be smart about copyrights. Anderson quips, “For starters, don’t use anything that belongs to any one else.” Seriously, Anderson urges publishers to educate themselves about fair use guidelines, which permit the use of small portions of copyrighted material for comment, criticism, parody or educational purposes. It’s a serious matter: Anderson warns that copyright judgments come with statutory damages that can be as high as $150,000 per violation. For that reason, Anderson urges publishers to think carefully before choosing to defy a request to remove material that someone claims is infringing on a copyright or trademark.

EFF maintains that major copyright holders such as entertainment companies often make abusive use of copyright laws — combating that abuse is one of their major areas of advocacy.

But online publishers are also copyright holders, and sometimes they, too, have to take action to protect themselves. Blogger Elise Bauer warns that there are some people who use RSS feeds to aggregate others’ content without their permission, forming their own revenue-generating website. Bauer urges using the Digital Millennium Copyright Act against them, either by filing a complaint with Google for content scrapers who use its AdSense program, or by complaining directly to the DMCA office itself.

When in doubt, ask a lawyer Anderson said the published guides and training workshops are great for general knowledge, but it’s best to consult an attorney for really specific questions. And EFF spokeswoman Rebecca Jesschke says that their attorneys have found that some media lawyers are willing to consult with small publishers for a reduced fee, assuming that the matter in question isn’t too involved.

Bottom line: choosing to publish online is an enormous responsibility, and it carries risks. But a professional attitude, self-education and a few proactive steps can go a long way.

Consider liability insurance Anderson says media liability insurance can offer “peace of mind” for online publishers. One leading provider, Media/Professional/Insurance, says the right policy offers much more. M/PI is one of two companies specializing in policies tailored for cyberspace-based businesses.

* * *
In addition: The SPJ, EFF and MBA are just a few of the professional organizations and advocacy groups that offer legal advice and support. Others include:

Protecting your business by fighting plagiarism online

Jonathan Bailey is the founder and editor of Plagiarism Today, a weblog devoted to tracking incidents of plagiarism online and helping online writers to protect their work from plagiarists. Bailey graduated in 2002 with bachelor’s degree in journalism and mass communications from the University of South Carolina.

The ease of publishing online has helped transform plagiarism from an ethics problem to an economic one. Automated bots scrape content from the Web, and unethical webmasters cut and paste others’ work on to their own sites, to create massive number of pages on which to serve pay-per-click advertisements. A recent discussion at Webmaster World detailed how “Made for AdSense” sites [MFAs] parlay often-plagiarized content into big bucks, at the expense of deserving writers, who lose both readers and advertising clicks to the plagiarists.

Fortunately, Web publishers have legal weapons with which they can fight back. The Digital Millennium Copyright Act gives copyright owners strong powers to go after and shut down plagiarists online. Creative Commons licenses allow writers a way to control the terms under which their work can be republished and shared. Bailey recently spoke with OJR over the phone, about these and other issues in the dark world of plagiarism online.

OJR: Why did you start Plagiarism Today?

BAILEY: Well, it was a little bit over a year ago. I was doing some pretty standard anti-plagiarism work, informing some hosts of my work being abused, especially trying to work on getting some of my work being taken down. And one of them shot back saying it needed additional information from me. And it kind of caught me off-guard.

So I took a look at the information they required of me, the DMCA information, and they’re absolutely correct. I’m supposed to provide this information. And they had every right to call me on it. So I realized that I kind of have to keep up to date on this. I can’t be slacking anymore. I searched high and low for a new site on this issue over the course of three days. And when I didn’t find one, I just decided to go ahead and make it.

OJR: How do you define plagiarism? Is it simply the flip side of fair use or is there something more to it?

BAILEY: Well, personally, with my work, I’m very liberal about allowing just regular re-use. I mean if you wish to you know re-use it on your site, so long as you do with an accredited link back, I have no real problem with that. So if people use Creative Commons licenses and other things to make that perfectly legal, I encourage it.

What I’m specifically interested in is the taking of someone else’s words and claiming them as your own. You know, copyright law doesn’t protect ideas which is actually a very good thing, because journalists would be as screwed, just as I would be. So even though plagiarism of ideas might be technically plagiarism in an academic sense, I’m more interested in plagiarism in terms of the copyright law sense which means actually taking someone else’s words and claiming it as your own. Actual copy and paste plagiarism.

OJR: Traditionally like has been seen as an ethical problem for journalists and publishers. But would you agree that the Internet is making it even more of an economic problem?

BAILEY: Oh, there’s no doubt about that. It doesn’t really matter these days if you’re running the site yourself or if you’re you know part of a major online organization producing a website. The content is expensive. It takes either a lot of time or a lot of money to obtain it and create it. And you know when someone takes that content from you like that as their’s, it’s actually considered an unfair business practice for one. And we’re seeing a lot of lawsuits coming up like that lately. They’ve been filing not only for copyright infringement and the usual array that comes with it, but also on fair business practices, because they’re essentially using your content, your work that you paid for, you put the effort into to their financial benefit. And that hurts you because in the case of search engines and so forth, you’re competing illegal copies of your own work and, you know, you also have the reputation issue that comes with it.

I was just speaking with a guy online today who has work plagiarized, and an individual stranger called it, and then wrote him thinking he was potentially the plagiarist. So, and you know as any journalist will tell you, reputation is their stock in trade.

OJR: How big of an economic threat at this point do you see so-called scraper sites being? And what do you think the trend is gonna be with that technology? Is it just a blip at this point? Or is it becoming a real money threat?

BAILEY: Well, it’s definitely becoming a real money threat. It’s already a real money threat to anyone that relies heavily on search engines for traffic. I mean, all this talk about search engine optimization (SEO), if you do a lot of that, you depend upon it heavily for traffic. And someone else is stealing your content and competing at you, that’s pretty much the technical equivalent of ripping your own arm off and beating you over the head with it. It ain’t pretty, but it’s what’s going on.

So, you know, if you put in a lot of time and money into getting that number 1, number 2 ranking and someone else just comes along and steals it and achieves a similar result, automatically that’s a definite money threat. So anyone that relies heavily on that [SEO] is feeling it now. Others that don’t rely on it so heavily aren’t gonna feel it. It really comes down to how much do you depend upon the search engines at this time.

But the real economic threat in a lot of ways on this issue actually is the search engines themselves, because they’re the ones having to spend the money to fight and eliminate these sites and try to clean up their own databases. And that’s money they could have been putting in other things.

OJR: How can a publisher whether they be a large newspaper company or just an individual blogger protect itself against plagiarists online?

BAILEY: Well, bloggers kind of have an advantage over the majority of people, because they realize that most of the plagiarism involving their work isn’t gonna be just traditional [manual] copy and paste plagiarism. It’s gonna be the scrapers we’ve been talking about. That kind of gives you a heads-up. And there’s a product called Feedburner that will take your feed and it basically makes another version of your [RSS or Atom] feed. And originally Feedburner’s only goal was statistical analysis kind of you know let you know how many people subscribe to your feed, that kind of cool thing. In that regard, it’s worth its weight in gold already. I mean as a webmaster, I’m obsessed with statistics, so Feedburner is a great tool there.

But it also now has got a good feel for what is a normal use of a feed versus something that seems a little weird. And they can spot those uncommon uses. And if you check that regularly, you’ll pretty much seeing those scrapers. That’s one thing.

The other solution that a lot of people have taken to doing is basically truncating their feed by publishing the headline and the first few programs in a feed. The legitimate reader can just click the actual article. That’s only a temporary solution at best, because we’ve already got scrapers they can pull from the site. So how long that’s actually gonna be effective is up for debate.

If you have a lot of static content, like in my original experiences, it dealt with a large poetry library, what you can do is visit and it performs automated Google searches for whatever string you provide, in the background, and e-mails you when something new pops up. So all you have to do is take a quote or two from your static piece of content, put it in there, and watch it kick out the results for you. And every time you get a new result in your email box, you just follow-up on it and make sure it’s a legitimate use or see if it’s a plagiarized copy.

OJR: What happens if it is a plagiarized copy? What do you do from there?

BAILEY: Well, everyone seems to have his own system. The system that I’ve used I’ve honed to what I think is about the best resolution percentage you can have. What I usually do is I first try to contact the plagiarist directly. I mean my experience is plagiarists will, when contacted and presented with this, will remove the work about half the time. But I consider it a gentlemanly thing to do. If there is a means of contacting the plagiarist, which is increasingly becoming harder to find, I have a cease and desist letter that I have ready. I mean I can literally paste it into any form, any time, any place, drop it and send it, and then hopefully they’ll cooperate and all will be handled.

If not, you have to contact the host. If the host is American, then the host has to comply with the DMCA, the Digital Millennium Copyright Act of 1998, where if you send a specially formatted complaint that has all the required information that a DMCA complaint should have — identifies the work, your personal information, statement of penalty of perjury, those are various requirements of it — you send that to the host, pretty much you can guarantee the work comes down, because in the United States and the European Union now when a host is notified of copyright infringement being taking place on their servers, they have to delete the work.

So, by the time you get done with the host level, you’ve already resolved over 95 percent of complaints. Half can be resolved at the plagiarist level. Forty-five percent or more can be resolved at the host level.

From there, I’ve run into problems, for example, with a South Korean host or a Japanese host where they don’t have any set rule. And you contact them and politely make the request, and they don’t do anything.

But you discover that this Japanese host who’s so proud of being a Japanese host actually is leasing their servers from America. If you climb the ladder one step, it might be a Japanese company, but their actual computers are sitting somewhere in California. And you can just contact their host and say, “Hey, look, this is the situation, this is what I tried to do,” and usually they’ll step in, since nearly all hosting companies either operate out of the U.S. or the E.U. It’s usually a pretty simple matter of just finding the right person.

And if all that fails, you can contact all three of the search engines, Google MSN, Yahoo! They’re all American and you get the site removed from those search engines. And if they have their own domain, you can contact the domain registrar too. Most major registrars, though not necessarily obligated to, will delete access to a domain, if it’s been proved to be a domain vault for copyrighting infringement purposes. It’s like a ladder. You just keep climbing.

OJR: Do you think that search engines have an obligation to do more to protect writers and publishers from those who misuse or copy their work?

BAILEY: No. Really I have never personally in any of my cases reported anyone to a search engine to get them removed from that. I don’t think that’s necessarily a great strategy for a lot of reasons. And I’m kind of nervous about the idea of Google, Yahoo! and MSN being the search engine police, I mean, the copyright police of the web. That idea really bugs me.

OJR: Why so? Why does that bug you?

BAILEY: Well, I don’t think an issue this big, enforcement of an issue this big should be relegated to a handful of individuals or a handful of companies. I think it’s just way too much power and control, because if we all look to Google as being the copyright police, any minor change in Google’s policy will be felt all across the world in major ways.

I would be very worried if there was anyone with that kind of power to do so, because if Google just woke up and said, “You know what? I don’t feel like doing copyright enforcement anymore,” yeah, it would be illegal, but what if they moved to Russia or something? They could just you know pack up and move, then the whole world, if we’d been using Google as the copyright police, would be out of luck.

I personally don’t use search engines to fight my battles like that. I try to deal with hosts. Hosts are the ones that can actually take a work down. Search engines just make it harder to find. And in a way that’s kind of destructive to you too, because now you know there’s a plagiarized copy out there, you just might not be able to find it again. I personally don’t think search engines should be the ones bearing that responsibility, but it is there. And if I ever ran against that wall truly hard, I probably would use it, but it has not come to that point yet.

OJR: On the legal side, do you think that the current laws regarding copyright in the United States are where they need to be? Or do you think that there needs to be statutory changes in this area?

BAILEY: The DMCA is a very controversial law. Just do it a search for it, and you’ll find all kinds of people that hate lots of it. The area that I deal with, the Safe Harbor Provisions, which is where you get DMCA notices from and all this notice and take down, isn’t nearly as controversial as other elements. And the other elements of the DMCA do bother me too, the anti-circumvention which makes it illegal to circumvent any digital rights management software for any purpose. That unnerves me. And there are other things in the DMCA which I find disconcerting.

But as far as dealing with plagiarism enforcement, I think the laws are more or less there. I think they’ve got the right idea. I think they’re trying, in the United States at least, I think they did a moderately good job in this one particular area of balancing free speech, access to the Internet and free speech with copyright.

I would like to see some kind of a copyright infringement small claims court, because as of right now, a cease and desist letter threatening a lawsuit’s a pretty empty threat, because I would have to go to California, get a lawyer, file a motion, do all that expensive stuff. And even though I probably would be able to recoup the cost in a plagiarism case, it would just be an incredible expenditure of time and money. It would be much easier if there was a small claims court of some variety that didn’t require lawyers and formal motions, or much like you know your regular small claims court to deal with cases that do not have a large amount of statutory damages. I think that would be a nice idea, but I’m not sure how it would work. There’s a lot of details to that that would have to be ironed out.