Protecting your business by fighting plagiarism online

Jonathan Bailey is the founder and editor of Plagiarism Today, a weblog devoted to tracking incidents of plagiarism online and helping online writers to protect their work from plagiarists. Bailey graduated in 2002 with bachelor’s degree in journalism and mass communications from the University of South Carolina.

The ease of publishing online has helped transform plagiarism from an ethics problem to an economic one. Automated bots scrape content from the Web, and unethical webmasters cut and paste others’ work on to their own sites, to create massive number of pages on which to serve pay-per-click advertisements. A recent discussion at Webmaster World detailed how “Made for AdSense” sites [MFAs] parlay often-plagiarized content into big bucks, at the expense of deserving writers, who lose both readers and advertising clicks to the plagiarists.

Fortunately, Web publishers have legal weapons with which they can fight back. The Digital Millennium Copyright Act gives copyright owners strong powers to go after and shut down plagiarists online. Creative Commons licenses allow writers a way to control the terms under which their work can be republished and shared. Bailey recently spoke with OJR over the phone, about these and other issues in the dark world of plagiarism online.

OJR: Why did you start Plagiarism Today?

BAILEY: Well, it was a little bit over a year ago. I was doing some pretty standard anti-plagiarism work, informing some hosts of my work being abused, especially trying to work on getting some of my work being taken down. And one of them shot back saying it needed additional information from me. And it kind of caught me off-guard.

So I took a look at the information they required of me, the DMCA information, and they’re absolutely correct. I’m supposed to provide this information. And they had every right to call me on it. So I realized that I kind of have to keep up to date on this. I can’t be slacking anymore. I searched high and low for a new site on this issue over the course of three days. And when I didn’t find one, I just decided to go ahead and make it.

OJR: How do you define plagiarism? Is it simply the flip side of fair use or is there something more to it?

BAILEY: Well, personally, with my work, I’m very liberal about allowing just regular re-use. I mean if you wish to you know re-use it on your site, so long as you do with an accredited link back, I have no real problem with that. So if people use Creative Commons licenses and other things to make that perfectly legal, I encourage it.

What I’m specifically interested in is the taking of someone else’s words and claiming them as your own. You know, copyright law doesn’t protect ideas which is actually a very good thing, because journalists would be as screwed, just as I would be. So even though plagiarism of ideas might be technically plagiarism in an academic sense, I’m more interested in plagiarism in terms of the copyright law sense which means actually taking someone else’s words and claiming it as your own. Actual copy and paste plagiarism.

OJR: Traditionally like has been seen as an ethical problem for journalists and publishers. But would you agree that the Internet is making it even more of an economic problem?

BAILEY: Oh, there’s no doubt about that. It doesn’t really matter these days if you’re running the site yourself or if you’re you know part of a major online organization producing a website. The content is expensive. It takes either a lot of time or a lot of money to obtain it and create it. And you know when someone takes that content from you like that as their’s, it’s actually considered an unfair business practice for one. And we’re seeing a lot of lawsuits coming up like that lately. They’ve been filing not only for copyright infringement and the usual array that comes with it, but also on fair business practices, because they’re essentially using your content, your work that you paid for, you put the effort into to their financial benefit. And that hurts you because in the case of search engines and so forth, you’re competing illegal copies of your own work and, you know, you also have the reputation issue that comes with it.

I was just speaking with a guy online today who has work plagiarized, and an individual stranger called it, and then wrote him thinking he was potentially the plagiarist. So, and you know as any journalist will tell you, reputation is their stock in trade.

OJR: How big of an economic threat at this point do you see so-called scraper sites being? And what do you think the trend is gonna be with that technology? Is it just a blip at this point? Or is it becoming a real money threat?

BAILEY: Well, it’s definitely becoming a real money threat. It’s already a real money threat to anyone that relies heavily on search engines for traffic. I mean, all this talk about search engine optimization (SEO), if you do a lot of that, you depend upon it heavily for traffic. And someone else is stealing your content and competing at you, that’s pretty much the technical equivalent of ripping your own arm off and beating you over the head with it. It ain’t pretty, but it’s what’s going on.

So, you know, if you put in a lot of time and money into getting that number 1, number 2 ranking and someone else just comes along and steals it and achieves a similar result, automatically that’s a definite money threat. So anyone that relies heavily on that [SEO] is feeling it now. Others that don’t rely on it so heavily aren’t gonna feel it. It really comes down to how much do you depend upon the search engines at this time.

But the real economic threat in a lot of ways on this issue actually is the search engines themselves, because they’re the ones having to spend the money to fight and eliminate these sites and try to clean up their own databases. And that’s money they could have been putting in other things.

OJR: How can a publisher whether they be a large newspaper company or just an individual blogger protect itself against plagiarists online?

BAILEY: Well, bloggers kind of have an advantage over the majority of people, because they realize that most of the plagiarism involving their work isn’t gonna be just traditional [manual] copy and paste plagiarism. It’s gonna be the scrapers we’ve been talking about. That kind of gives you a heads-up. And there’s a product called Feedburner that will take your feed and it basically makes another version of your [RSS or Atom] feed. And originally Feedburner’s only goal was statistical analysis kind of you know let you know how many people subscribe to your feed, that kind of cool thing. In that regard, it’s worth its weight in gold already. I mean as a webmaster, I’m obsessed with statistics, so Feedburner is a great tool there.

But it also now has got a good feel for what is a normal use of a feed versus something that seems a little weird. And they can spot those uncommon uses. And if you check that regularly, you’ll pretty much seeing those scrapers. That’s one thing.

The other solution that a lot of people have taken to doing is basically truncating their feed by publishing the headline and the first few programs in a feed. The legitimate reader can just click the actual article. That’s only a temporary solution at best, because we’ve already got scrapers they can pull from the site. So how long that’s actually gonna be effective is up for debate.

If you have a lot of static content, like in my original experiences, it dealt with a large poetry library, what you can do is visit Google.com/alerts and it performs automated Google searches for whatever string you provide, in the background, and e-mails you when something new pops up. So all you have to do is take a quote or two from your static piece of content, put it in there, and watch it kick out the results for you. And every time you get a new result in your email box, you just follow-up on it and make sure it’s a legitimate use or see if it’s a plagiarized copy.

OJR: What happens if it is a plagiarized copy? What do you do from there?

BAILEY: Well, everyone seems to have his own system. The system that I’ve used I’ve honed to what I think is about the best resolution percentage you can have. What I usually do is I first try to contact the plagiarist directly. I mean my experience is plagiarists will, when contacted and presented with this, will remove the work about half the time. But I consider it a gentlemanly thing to do. If there is a means of contacting the plagiarist, which is increasingly becoming harder to find, I have a cease and desist letter that I have ready. I mean I can literally paste it into any form, any time, any place, drop it and send it, and then hopefully they’ll cooperate and all will be handled.

If not, you have to contact the host. If the host is American, then the host has to comply with the DMCA, the Digital Millennium Copyright Act of 1998, where if you send a specially formatted complaint that has all the required information that a DMCA complaint should have — identifies the work, your personal information, statement of penalty of perjury, those are various requirements of it — you send that to the host, pretty much you can guarantee the work comes down, because in the United States and the European Union now when a host is notified of copyright infringement being taking place on their servers, they have to delete the work.

So, by the time you get done with the host level, you’ve already resolved over 95 percent of complaints. Half can be resolved at the plagiarist level. Forty-five percent or more can be resolved at the host level.

From there, I’ve run into problems, for example, with a South Korean host or a Japanese host where they don’t have any set rule. And you contact them and politely make the request, and they don’t do anything.

But you discover that this Japanese host who’s so proud of being a Japanese host actually is leasing their servers from America. If you climb the ladder one step, it might be a Japanese company, but their actual computers are sitting somewhere in California. And you can just contact their host and say, “Hey, look, this is the situation, this is what I tried to do,” and usually they’ll step in, since nearly all hosting companies either operate out of the U.S. or the E.U. It’s usually a pretty simple matter of just finding the right person.

And if all that fails, you can contact all three of the search engines, Google MSN, Yahoo! They’re all American and you get the site removed from those search engines. And if they have their own domain, you can contact the domain registrar too. Most major registrars, though not necessarily obligated to, will delete access to a domain, if it’s been proved to be a domain vault for copyrighting infringement purposes. It’s like a ladder. You just keep climbing.

OJR: Do you think that search engines have an obligation to do more to protect writers and publishers from those who misuse or copy their work?

BAILEY: No. Really I have never personally in any of my cases reported anyone to a search engine to get them removed from that. I don’t think that’s necessarily a great strategy for a lot of reasons. And I’m kind of nervous about the idea of Google, Yahoo! and MSN being the search engine police, I mean, the copyright police of the web. That idea really bugs me.

OJR: Why so? Why does that bug you?

BAILEY: Well, I don’t think an issue this big, enforcement of an issue this big should be relegated to a handful of individuals or a handful of companies. I think it’s just way too much power and control, because if we all look to Google as being the copyright police, any minor change in Google’s policy will be felt all across the world in major ways.

I would be very worried if there was anyone with that kind of power to do so, because if Google just woke up and said, “You know what? I don’t feel like doing copyright enforcement anymore,” yeah, it would be illegal, but what if they moved to Russia or something? They could just you know pack up and move, then the whole world, if we’d been using Google as the copyright police, would be out of luck.

I personally don’t use search engines to fight my battles like that. I try to deal with hosts. Hosts are the ones that can actually take a work down. Search engines just make it harder to find. And in a way that’s kind of destructive to you too, because now you know there’s a plagiarized copy out there, you just might not be able to find it again. I personally don’t think search engines should be the ones bearing that responsibility, but it is there. And if I ever ran against that wall truly hard, I probably would use it, but it has not come to that point yet.

OJR: On the legal side, do you think that the current laws regarding copyright in the United States are where they need to be? Or do you think that there needs to be statutory changes in this area?

BAILEY: The DMCA is a very controversial law. Just do it a search for it, and you’ll find all kinds of people that hate lots of it. The area that I deal with, the Safe Harbor Provisions, which is where you get DMCA notices from and all this notice and take down, isn’t nearly as controversial as other elements. And the other elements of the DMCA do bother me too, the anti-circumvention which makes it illegal to circumvent any digital rights management software for any purpose. That unnerves me. And there are other things in the DMCA which I find disconcerting.

But as far as dealing with plagiarism enforcement, I think the laws are more or less there. I think they’ve got the right idea. I think they’re trying, in the United States at least, I think they did a moderately good job in this one particular area of balancing free speech, access to the Internet and free speech with copyright.

I would like to see some kind of a copyright infringement small claims court, because as of right now, a cease and desist letter threatening a lawsuit’s a pretty empty threat, because I would have to go to California, get a lawyer, file a motion, do all that expensive stuff. And even though I probably would be able to recoup the cost in a plagiarism case, it would just be an incredible expenditure of time and money. It would be much easier if there was a small claims court of some variety that didn’t require lawyers and formal motions, or much like you know your regular small claims court to deal with cases that do not have a large amount of statutory damages. I think that would be a nice idea, but I’m not sure how it would work. There’s a lot of details to that that would have to be ironed out.